Amith Ananthram

I'm a PhD candidate in Computer Science at Columbia University, advised by Professor Kathleen McKeown. My work explores vision-language models, in particular the strengths and limitations of language-mediated vision. Most recently, my focus has been on detailed image description with an emphasis on works of art.

Before returning to graduate school I built financial products at Stripe and Wealthfront as a full-stack software engineer. I led cross-functional projects that delivered thoughtful experiences with rigorous technical solutions. I enjoy building reliable, maintainable systems that drive value for end users.

Email  /  CV  /  LinkedIn  /  Google Scholar  /  GitHub  /  Hugging Face

A headshot of Amith Ananthram

Research Interests

Architectures, pre/post-training, personalization and evaluation methods for image and video understanding in LLMs. The strengths and limitations of language-mediated vision.

Recent News

January 2026 Our paper "PoSh: Using Scene Graphs to Guide LLMs-as-a-Judge for Detailed Image Descriptions" has been accepted at ICLR 2026! I'll be at the conference in Brazil -- would love to chat about research and job opportunities!

Selected Publications

A complete list is available in my CV.

PoSh: Using Scene Graphs to Guide LLMs-as-a-Judge for Detailed Image Descriptions
Amith Ananthram, Elias Stengel-Eskin, Lorena A. Bradford, Julia Demarest, Adam Purvis, Keith Krut, Rina Elster Pantalony, Mohit Bansal, Kathleen McKeown
ICLR, 2026

  • Developed PoSh, an interpretable & replicable metric for detailed image descriptions.
  • Introduced DOCENT, a new dataset of artwork with expert descriptions and judgments from art history students. DOCENT enables evaluating both detailed image description metrics and detailed image descriptions themselves.
  • Part of an ongoing collaboration with a team at the National Gallery of Art to expand accessibility in their collection.

Links: paper / metric (PoSh) / datasets (DOCENT) / huggingface (DOCENT)

datasets post-training evaluation images
Mining Contextualized Visual Associations from Images for Creativity Understanding
Ananya Sahu, Amith Ananthram, Kathleen McKeown
INLG, 2025   Best Long Paper

  • Developed a scalable method for mining contextualized visual associations from unlabeled images.
  • Demonstrated improved zero-shot performance in multimodal creative domains by fine-tuning on mined associations.

Links: paper

data synthesis pre-training images
See It from My Perspective: How Language Affects Cultural Bias in Image Understanding
Amith Ananthram, Elias Stengel-Eskin, Mohit Bansal, Kathleen McKeown
ICLR, 2025

  • Characterized Western bias in vision-language models across visual tasks.
  • Identified language diversity in pre-training as a key factor in cultural bias, showing that inference in culturally-aligned languages reduces bias most effectively when those languages were well-represented during text-only pre-training.

Links: paper / code / poster

bias multiculturalism multilingualism pre-training evaluation images
Data Caricatures: On the Representation of African American Language in Pretraining Corpora
Nicholas Deas, Blake Vente, Amith Ananthram, Jessica A. Grieser, Desmond Patton, Shana Kleiner, James Shepard, Kathleen McKeown
ACL, 2025

  • Revealed severe underrepresentation of African American Language (AAL) in pretraining corpora.
  • Demonstrated quality issues in AAL representation (harmful stereotypes) that are exacerbated by automated filters.

Links: paper

bias pre-training evaluation
Enhancing Multimodal Affective Analysis with Learned Live Comment Features
Zhaoyuan Deng, Amith Ananthram, Kathleen McKeown
AAAI, 2025

  • Created the LCAffect dataset containing 11 million real-time comments for English and Chinese videos.
  • Developed a contrastive learning approach to generate synthetic live comment features from video encoders, achieving state-of-the-art performance on affective analysis in both English and Chinese.

Links: paper

datasets pre-training videos
FeelingBlue: a Corpus for Understanding the Emotional Connotation of Color in Context
Amith Ananthram, Olivia Winn, Smaranda Muresan
TACL, 2023   (Presented at ACL 2023)

  • Introduced FeelingBlue, a dataset with art annotated with emotion intensity and rationales of relative rankings.
  • Developed a neural ensemble model that recolors images to enhance specific emotions and justifies changes in text.

Links: paper / code / huggingface (FeelingBlue) / poster

datasets pre-training images

Industry Experience

Stripe, Software Engineer (Levels 2–3), San Francisco, CA   |   Feb 2018 – Aug 2019

Wealthfront, Software Engineer (Levels 1–3), Redwood City, CA   |   Aug 2014 – Oct 2017

Teaching

  • Head TA, Language Generation Seminar (Columbia, Fall 2022)
  • Teaching Assistant, Natural Language Processing (Columbia, Fall 2021)