Automatic Evaluation of Language Generation Technology Based on Structure Alignment

Katsuki Chousa, Tsutomu Hirao


Abstract
Language generation techniques require automatic evaluation to carry out efficient and reproducible experiments. While n-gram matching is standard, it fails to capture semantic equivalence with different wording. Recent methods have addressed this issue by using contextual embeddings from pre-trained language models to compute the similarity between reference and hypothesis. However, these methods frequently disregard the syntax of sentences, despite its crucial role in determining meaning, and thus assign unjustifiably high scores. This paper proposes an automatic evaluation metric that considers both the words in sentences and their syntactic structures. We integrate syntactic information into the recent embedding-based approach. Experimental results obtained from two NLP tasks show that our method is at least comparable to standard baselines.
Anthology ID:
2025.coling-main.512
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7663–7670
Language:
URL:
https://aclanthology.org/2025.coling-main.512/
DOI:
Bibkey:
Cite (ACL):
Katsuki Chousa and Tsutomu Hirao. 2025. Automatic Evaluation of Language Generation Technology Based on Structure Alignment. In Proceedings of the 31st International Conference on Computational Linguistics, pages 7663–7670, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Automatic Evaluation of Language Generation Technology Based on Structure Alignment (Chousa & Hirao, COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.512.pdf