A Dynamic, Interpreted CheckList for Meaning-oriented NLG Metric Evaluation – through the Lens of Semantic Similarity Rating

Laura Zeidler, Juri Opitz, Anette Frank


Abstract
Evaluating the quality of generated text is difficult, since traditional NLG evaluation metrics, focusing more on surface form than meaning, often fail to assign appropriate scores. This is especially problematic for AMR-to-text evaluation, given the abstract nature of AMR.Our work aims to support the development and improvement of NLG evaluation metrics that focus on meaning by developing a dynamic CheckList for NLG metrics that is interpreted by being organized around meaning-relevant linguistic phenomena. Each test instance consists of a pair of sentences with their AMR graphs and a human-produced textual semantic similarity or relatedness score. Our CheckList facilitates comparative evaluation of metrics and reveals strengths and weaknesses of novel and traditional metrics. We demonstrate the usefulness of CheckList by designing a new metric GraCo that computes lexical cohesion graphs over AMR concepts. Our analysis suggests that GraCo presents an interesting NLG metric worth future investigation and that meaning-oriented NLG metrics can profit from graph-based metric components using AMR.
Anthology ID:
2022.starsem-1.14
Volume:
Proceedings of the 11th Joint Conference on Lexical and Computational Semantics
Month:
July
Year:
2022
Address:
Seattle, Washington
Editors:
Vivi Nastase, Ellie Pavlick, Mohammad Taher Pilehvar, Jose Camacho-Collados, Alessandro Raganato
Venue:
*SEM
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
157–172
Language:
URL:
https://aclanthology.org/2022.starsem-1.14
DOI:
10.18653/v1/2022.starsem-1.14
Bibkey:
Cite (ACL):
Laura Zeidler, Juri Opitz, and Anette Frank. 2022. A Dynamic, Interpreted CheckList for Meaning-oriented NLG Metric Evaluation – through the Lens of Semantic Similarity Rating. In Proceedings of the 11th Joint Conference on Lexical and Computational Semantics, pages 157–172, Seattle, Washington. Association for Computational Linguistics.
Cite (Informal):
A Dynamic, Interpreted CheckList for Meaning-oriented NLG Metric Evaluation – through the Lens of Semantic Similarity Rating (Zeidler et al., *SEM 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.starsem-1.14.pdf
Data
SICKSentEval