Assessing Semantic Consistency in Data‐to‐Text Generation: A Meta-Evaluation of Textual, Semantic and Model-Based Metrics

Rudali Huidrom; Michela Lorandi; Simon Mille; Craig Thomson; Anja Belz

Assessing Semantic Consistency in Data‐to‐Text Generation: A Meta-Evaluation of Textual, Semantic and Model-Based Metrics

Rudali Huidrom, Michela Lorandi, Simon Mille, Craig Thomson, Anya Belz

Abstract

Ensuring semantic consistency between semantic-triple inputs and generated text is crucial in data‐to‐text generation, but continues to pose challenges both during generation and in evaluation. In order to assess how accurately semantic consistency can currently be assessed, we meta-evaluate 29 different evaluation methods in terms of their ability to predict human semantic-consistency ratings. The evaluation methods include embeddings‐based, overlap‐based, and edit‐distance metrics, as well as learned regressors and a prompted ‘LLM‐as‐judge’ protocol. We meta-evaluate on two datasets: the WebNLG 2017 human evaluation dataset, and a newly created WebNLG-style dataset that none of the methods can have seen during training. We find that none of the traditional textual similarity metrics or the pre-Transformer model-based metrics are suitable for the task of semantic consistency assessment. LLM-based methods perform well on the whole, but best correlations with human judgments still lag behind those seen in other text generation tasks.

Anthology ID:: 2025.inlg-main.6
Volume:: Proceedings of the 18th International Natural Language Generation Conference
Month:: October
Year:: 2025
Address:: Hanoi, Vietnam
Editors:: Lucie Flek, Shashi Narayan, Lê Hồng Phương, Jiahuan Pei
Venue:: INLG
SIG:: SIGGEN
Publisher:: Association for Computational Linguistics
Note:
Pages:: 98–107
Language:
URL:: https://aclanthology.org/2025.inlg-main.6/
DOI:
Bibkey:
Cite (ACL):: Rudali Huidrom, Michela Lorandi, Simon Mille, Craig Thomson, and Anya Belz. 2025. Assessing Semantic Consistency in Data‐to‐Text Generation: A Meta-Evaluation of Textual, Semantic and Model-Based Metrics. In Proceedings of the 18th International Natural Language Generation Conference, pages 98–107, Hanoi, Vietnam. Association for Computational Linguistics.
Cite (Informal):: Assessing Semantic Consistency in Data‐to‐Text Generation: A Meta-Evaluation of Textual, Semantic and Model-Based Metrics (Huidrom et al., INLG 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.inlg-main.6.pdf

PDF Cite Search Fix data