LaMP-Cap: Personalized Figure Caption Generation With Multimodal Figure Profiles

Ho Yin Sam Ng; Edward Hsu; Aashish Anantha Ramakrishnan; Branislav Kveton; Nedim Lipka; Franck Dernoncourt; Dongwon Lee; Tong Yu; Sungchul Kim; Ryan A. Rossi; Ting-Hao Huang

doi:10.18653/v1/2025.findings-emnlp.521

LaMP-Cap: Personalized Figure Caption Generation With Multimodal Figure Profiles

Ho Yin Sam Ng, Edward Hsu, Aashish Anantha Ramakrishnan, Branislav Kveton, Nedim Lipka, Franck Dernoncourt, Dongwon Lee, Tong Yu, Sungchul Kim, Ryan A. Rossi, Ting-Hao Kenneth Huang

Abstract

Figure captions are crucial for helping readers understand and remember a figure’s key message. Many models have been developed to generate these captions, helping authors compose better quality captions more easily. Yet, authors almost always need to revise generic AI-generated captions to match their writing style and the domain’s style, highlighting the need for personalization. Despite language models’ personalization (LaMP) advances, these technologies often focus on text-only settings and rarely address scenarios where both inputs and profiles are multimodal. This paper introduces LaMP-Cap, a dataset for personalized figure caption generation with multimodal figure profiles. For each target figure, LaMP-Cap provides not only the needed inputs, such as figure images, but also up to three other figures from the same document—each with its image, caption, and figure-mentioning paragraphs—as a profile to characterize the context. Experiments with four LLMs show that using profile information consistently helps generate captions closer to the original author-written ones. Ablation studies reveal that images in the profile are more helpful than figure-mentioning paragraphs, highlighting the advantage of using multimodal profiles over text-only ones.

Anthology ID:: 2025.findings-emnlp.521
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9818–9832
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.521/
DOI:: 10.18653/v1/2025.findings-emnlp.521
Bibkey:
Cite (ACL):: Ho Yin Sam Ng, Edward Hsu, Aashish Anantha Ramakrishnan, Branislav Kveton, Nedim Lipka, Franck Dernoncourt, Dongwon Lee, Tong Yu, Sungchul Kim, Ryan A. Rossi, and Ting-Hao Kenneth Huang. 2025. LaMP-Cap: Personalized Figure Caption Generation With Multimodal Figure Profiles. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 9818–9832, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: LaMP-Cap: Personalized Figure Caption Generation With Multimodal Figure Profiles (Ng et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.521.pdf
Checklist:: 2025.findings-emnlp.521.checklist.pdf

PDF Cite Search Checklist Fix data