Amith Ananthram

I'm a PhD candidate in Computer Science at Columbia University, advised by Professor Kathleen McKeown. My work explores vision-language models, in particular the strengths and limitations of different approaches to multimodal alignment. Most recently, my focus has been on detailed image description with an emphasis on works of art.

Before returning to graduate school I built financial products at Stripe and Wealthfront as a full-stack software engineer. I led cross-functional projects that delivered thoughtful experiences with rigorous technical solutions. I enjoy building reliable, maintainable systems that drive value for end users.

Email  /  CV  /  LinkedIn  /  Google Scholar

A headshot of Amith Ananthram

Research Interests

Architectures, pre/post-training and evaluation methods for vision-language models. Language and its role in vision.

Selected Publications

A complete list is available in my CV.

PoSh: Using Scene Graphs to Guide LLMs-as-a-Judge for Detailed Image Descriptions
Amith Ananthram, Elias Stengel-Eskin, Lorena A. Bradford, Julia Demarest, Adam Purvis, Keith Krut, Rina Elster Pantalony, Mohit Bansal, Kathleen McKeown
Under submission, 2025

  • Developed PoSh, an interpretable & replicable metric for detailed image descriptions.
  • Introduced DOCENT, a new dataset of artwork with expert descriptions and judgments from art history students. DOCENT enables evaluating both detailed image description metrics and detailed image descriptions themselves.
  • Part of an ongoing collaboration with a team at the National Gallery of Art to expand accessibility in their collection.

Links: paper / metric (PoSh) / datasets (DOCENT) / huggingface (DOCENT)

Mining Contextualized Visual Associations from Images for Creativity Understanding
Ananya Sahu, Amith Ananthram, Kathleen McKeown
INLG, 2025   Best Long Paper

  • Developed a scalable method for mining contextualized visual associations from unlabeled images.
  • Demonstrated improved zero-shot performance in multimodal creative domains by fine-tuning on mined associations.

Links: paper

See It from My Perspective: How Language Affects Cultural Bias in Image Understanding
Amith Ananthram, Elias Stengel-Eskin, Mohit Bansal, Kathleen McKeown
ICLR, 2025

  • Characterized Western bias in vision-language models across visual tasks.
  • Identified language diversity in pre-training as a key factor in cultural bias, showing that inference in culturally-aligned languages reduces bias most effectively when those languages were well-represented during text-only pre-training.

Links: paper / code / poster

Data Caricatures: On the Representation of African American Language in Pretraining Corpora
Nicholas Deas, Blake Vente, Amith Ananthram, Jessica A. Grieser, Desmond Patton, Shana Kleiner, James Shepard, Kathleen McKeown
ACL, 2025

  • Revealed severe underrepresentation of African American Language (AAL) in pretraining corpora.
  • Demonstrated quality issues in AAL representation (harmful stereotypes) that are exacerbated by automated filters.

Links: paper

Enhancing Multimodal Affective Analysis with Learned Live Comment Features
Zhaoyuan Deng, Amith Ananthram, Kathleen McKeown
AAAI, 2025

  • Created the LCAffect dataset containing 11 million real-time comments for English and Chinese videos.
  • Developed a contrastive learning approach to generate synthetic live comment features from video encoders, achieving state-of-the-art performance on affective analysis in both English and Chinese.

Links: paper

FeelingBlue: a Corpus for Understanding the Emotional Connotation of Color in Context
Amith Ananthram, Olivia Winn, Smaranda Muresan
TACL, 2023   (Presented at ACL 2023)

  • Introduced FeelingBlue, a dataset with art annotated with emotion intensity and rationales of relative rankings.
  • Developed a neural ensemble model that recolors images to enhance specific emotions and justifies changes in text.

Links: paper / code / huggingface (FeelingBlue) / poster

Industry Experience

Stripe, Software Engineer (Levels 2–3), San Francisco, CA   |   Feb 2018 – Aug 2019

Wealthfront, Software Engineer (Levels 1–3), Redwood City, CA   |   Aug 2014 – Oct 2017

Teaching

  • Head TA, Language Generation Seminar (Columbia, Fall 2022)
  • Teaching Assistant, Natural Language Processing (Columbia, Fall 2021)