Evaluating Text Generation Quality Using Spectral Distances of Surprisal

Zhichen Liu; Yongyuan Li; Yang Xu; Yu Wang (王昱, 王雨); Yingfang Yuan; Zuhao Yang

Evaluating Text Generation Quality Using Spectral Distances of Surprisal

Zhichen Liu, Yongyuan Li, Yang Xu, Yu Wang, Yingfang Yuan, Zuhao Yang

Abstract

We propose a novel automatic evaluation metric for open-ended text generation, which is a substantial improvement of the recently developed method, Fourier analysis of cross-entropy (FACE), hence, FACE-2. FACE-2 is a psycholinguistically inspired metric that extracts the dynamic patterns (spectrum) of text surprisal. Examined with open-ended text generation tasks, FACE-2 significantly outperforms a broad set of baseline metrics in revealing the model scaling effect, which scales up to models of 70B parameters, while many other existing metrics fail to capture this effect. We have also confirmed the advantage of FACE-2 in producing stronger agreement with human preferences from a large human-annotated dataset. We advocate for including metrics that mine the dynamics of likelihood in evaluating open-ended text generation, which covers broader aspects of human language than only using static likelihood-based or semantic-based metrics. Code repository: https://github.com/CLCS-SUSTech/FACEScore.

Anthology ID:: 2025.findings-emnlp.132
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2444–2463
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.132/
DOI:
Bibkey:
Cite (ACL):: Zhichen Liu, Yongyuan Li, Yang Xu, Yu Wang, Yingfang Yuan, and Zuhao Yang. 2025. Evaluating Text Generation Quality Using Spectral Distances of Surprisal. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 2444–2463, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Evaluating Text Generation Quality Using Spectral Distances of Surprisal (Liu et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.132.pdf
Checklist:: 2025.findings-emnlp.132.checklist.pdf

PDF Cite Search Checklist Fix data