Automatic and Reliable Evaluation for Academic Caption-to-Figure Generation with LMMs

Guanghui Ye; Huan Zhao; Qin Zhu; Fengnan Li; Jiaqi Li; Yixian Shen; Zhonghao Ren; Zhihua Jiang

Automatic and Reliable Evaluation for Academic Caption-to-Figure Generation with LMMs

Guanghui Ye, Huan Zhao, Qin Zhu, Fengnan Li, Jiaqi Li, Yixian Shen, Zhonghao Ren, Zhihua Jiang

Abstract

Existing datasets for evaluating text-to-image generation focus mostly on real-life images, which poses challenges for assessing academicfigure generation given real scientific captions, which is a hot topic in AI for Science. To fill the gap, we propose HE4AFG, a novel datasetwhich first provides a Holistic Evaluation for Academic caption-to-Figure Generation (AFG). Specifically, HE4AFG collects real figure captions from 8 scientific domains and finally generates 3,900 evaluation samples (particularly, including multi-panel figures) using 5 mainstream large multimodal models (LMMs). For each sample, we provide high-quality human ratings in terms of three aspects—scientific aesthetic (SA), topic relevance (TR), and attribute correctness (AC). Moreover, we present two trainable models: (1) HE4AFG-E, an automated Evaluation model for AFG, which generates aspect-aware training examples and then use them to train three aspect-specific evaluation modules via contrastive learning; (2) HE4AFG-R, an automated Refinement model, which generates and utilizes feedback on the quality of the figures (e.g., unfaithful elements) to continuously improve AFG. Extensive experiments on HE4AFG demonstrate the effectiveness and performance advantages of our models.

Anthology ID:: 2026.acl-long.2055
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 44406–44423
Language:
URL:: https://aclanthology.org/2026.acl-long.2055/
DOI:
Bibkey:
Cite (ACL):: Guanghui Ye, Huan Zhao, Qin Zhu, Fengnan Li, Jiaqi Li, Yixian Shen, Zhonghao Ren, and Zhihua Jiang. 2026. Automatic and Reliable Evaluation for Academic Caption-to-Figure Generation with LMMs. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 44406–44423, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Automatic and Reliable Evaluation for Academic Caption-to-Figure Generation with LMMs (Ye et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.2055.pdf
Checklist:: 2026.acl-long.2055.checklist.pdf

PDF Cite Search Checklist Fix data