Zixuan Huang
2025
FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback
Ashish Singh
|
Ashutosh Singh
|
Prateek Agarwal
|
Zixuan Huang
|
Arpita Singh
|
Tong Yu
|
Sungchul Kim
|
Victor Soares Bursztyn
|
Nesreen K. Ahmed
|
Puneet Mathur
|
Erik Learned-Miller
|
Franck Dernoncourt
|
Ryan Rossi
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Captions are crucial for understanding scientific visualizations and documents. Existing captioning methods for scientific figures rely on figure-caption pairs extracted from documents for training, many of which fall short with respect to metrics like helpfulness, explainability, and visual-descriptiveness, leading to generated captions being misaligned with reader preferences. To address this issue, we introduce FigCaps-HF, a new framework for figure-caption generation that can incorporate domain expert feedback in generating captions optimized for reader preferences. Our framework comprises of 1) an automatic method for evaluating the quality of figure-caption pairs, and 2) a novel reinforcement learning with human feedback (RLHF) method to optimize a generative figure-to-caption model for reader preferences. We demonstrate the effectiveness of our simple learning framework by improving performance over standard fine-tuning across different types of models. In particular, when using BLIP as the base model, our RLHF framework achieves a mean gain of 35.7%, 16.9%, 9%, and 11.4% in ROUGE, BLEU, Meteor, and CIDEr scores, respectively. Finally, we release a large-scale benchmark dataset with human feedback on figure-caption pairs to enable further evaluation and development of RLHF techniques for this problem.
Search
Fix author
Co-authors
- Prateek Agarwal 1
- Nesreen K. Ahmed 1
- Victor Soares Bursztyn 1
- Franck Dernoncourt 1
- Sungchul Kim 1
- show all...