|
Amith Ananthram
I'm a PhD candidate in Computer Science at Columbia University, advised by Professor Kathleen McKeown. My work explores vision-language models, in particular the strengths and limitations of language-mediated vision. Most recently, my focus has been on detailed image description with an emphasis on works of art.
Before returning to graduate school I built financial products at Stripe and Wealthfront as a full-stack software engineer. I led cross-functional projects that delivered thoughtful experiences with rigorous technical solutions. I enjoy building reliable, maintainable systems that drive value for end users.
Email /
CV /
LinkedIn /
Google Scholar /
GitHub /
Hugging Face
|
|
Research Interests
Architectures, pre/post-training, personalization and evaluation methods for image and video understanding in LLMs. The strengths and limitations of language-mediated vision.
|
Selected Publications
A complete list is available in my CV.
|
PoSh: Using Scene Graphs to Guide LLMs-as-a-Judge for Detailed Image Descriptions
Amith Ananthram, Elias Stengel-Eskin, Lorena A. Bradford, Julia Demarest, Adam Purvis, Keith Krut, Rina Elster Pantalony, Mohit Bansal, Kathleen McKeown
ICLR, 2026
- Developed PoSh, an interpretable & replicable metric for detailed image descriptions.
- Introduced DOCENT, a new dataset of artwork with expert descriptions and judgments from art history students. DOCENT enables evaluating both detailed image description metrics and detailed image descriptions themselves.
- Part of an ongoing collaboration with a team at the National Gallery of Art to expand accessibility in their collection.
Links: paper / metric (PoSh) / datasets (DOCENT) / huggingface (DOCENT)
datasets
post-training
evaluation
images
|
Mining Contextualized Visual Associations from Images for Creativity Understanding
Ananya Sahu, Amith Ananthram, Kathleen McKeown
INLG, 2025 Best Long Paper
- Developed a scalable method for mining contextualized visual associations from unlabeled images.
- Demonstrated improved zero-shot performance in multimodal creative domains by fine-tuning on mined associations.
Links: paper
data synthesis
pre-training
images
|
See It from My Perspective: How Language Affects Cultural Bias in Image Understanding
Amith Ananthram, Elias Stengel-Eskin, Mohit Bansal, Kathleen McKeown
ICLR, 2025
- Characterized Western bias in vision-language models across visual tasks.
- Identified language diversity in pre-training as a key factor in cultural bias, showing that inference in culturally-aligned languages reduces bias most effectively when those languages were well-represented during text-only pre-training.
Links: paper / code / poster
bias
multiculturalism
multilingualism
pre-training
evaluation
images
|
Data Caricatures: On the Representation of African American Language in Pretraining Corpora
Nicholas Deas, Blake Vente, Amith Ananthram, Jessica A. Grieser, Desmond Patton, Shana Kleiner, James Shepard, Kathleen McKeown
ACL, 2025
- Revealed severe underrepresentation of African American Language (AAL) in pretraining corpora.
- Demonstrated quality issues in AAL representation (harmful stereotypes) that are exacerbated by automated filters.
Links: paper
bias
pre-training
evaluation
|
Enhancing Multimodal Affective Analysis with Learned Live Comment Features
Zhaoyuan Deng, Amith Ananthram, Kathleen McKeown
AAAI, 2025
- Created the LCAffect dataset containing 11 million real-time comments for English and Chinese videos.
- Developed a contrastive learning approach to generate synthetic live comment features from video encoders, achieving state-of-the-art performance on affective analysis in both English and Chinese.
Links: paper
datasets
pre-training
videos
|
FeelingBlue: a Corpus for Understanding the Emotional Connotation of Color in Context
Amith Ananthram, Olivia Winn, Smaranda Muresan
TACL, 2023 (Presented at ACL 2023)
- Introduced FeelingBlue, a dataset with art annotated with emotion intensity and rationales of relative rankings.
- Developed a neural ensemble model that recolors images to enhance specific emotions and justifies changes in text.
Links: paper / code / huggingface (FeelingBlue) / poster
datasets
pre-training
images
|
Industry Experience
Stripe, Software Engineer (Levels 2–3), San Francisco, CA | Feb 2018 – Aug 2019
Wealthfront, Software Engineer (Levels 1–3), Redwood City, CA | Aug 2014 – Oct 2017
|
Teaching
- Head TA, Language Generation Seminar (Columbia, Fall 2022)
- Teaching Assistant, Natural Language Processing (Columbia, Fall 2021)
|
|