A Dual-Perspective NLG Meta-Evaluation Framework with Automatic Benchmark and Better Interpretability

Xinyu Hu; Mingqi Gao; Li Lin; Zhenghan Yu; Xiaojun Wan

doi:10.18653/v1/2025.acl-long.1327

A Dual-Perspective NLG Meta-Evaluation Framework with Automatic Benchmark and Better Interpretability

Xinyu Hu, Mingqi Gao, Li Lin, Zhenghan Yu, Xiaojun Wan

Abstract

In NLG meta-evaluation, evaluation metrics are typically assessed based on their consistency with humans. However, we identify some limitations in traditional NLG meta-evaluation approaches, such as issues in handling human ratings and ambiguous selections of correlation measures, which undermine the effectiveness of meta-evaluation. In this work, we propose a dual-perspective NLG meta-evaluation framework that focuses on different evaluation capabilities, thereby providing better interpretability. In addition, we introduce a method of automatically constructing the corresponding benchmarks without requiring new human annotations. Furthermore, we conduct experiments with 16 representative LLMs as the evaluators based on our proposed framework, comprehensively analyzing their evaluation performance from different perspectives.

Anthology ID:: 2025.acl-long.1327
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 27372–27395
Language:
URL:: https://aclanthology.org/2025.acl-long.1327/
DOI:: 10.18653/v1/2025.acl-long.1327
Bibkey:
Cite (ACL):: Xinyu Hu, Mingqi Gao, Li Lin, Zhenghan Yu, and Xiaojun Wan. 2025. A Dual-Perspective NLG Meta-Evaluation Framework with Automatic Benchmark and Better Interpretability. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 27372–27395, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: A Dual-Perspective NLG Meta-Evaluation Framework with Automatic Benchmark and Better Interpretability (Hu et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.1327.pdf

PDF Cite Search Fix data