Trainable Ranking Models to Evaluate the Semantic Accuracy of Data-to-Text Neural Generator

Nicolas Garneau; Luc Lamontagne

doi:10.18653/v1/2021.eval4nlp-1.6

Trainable Ranking Models to Evaluate the Semantic Accuracy of Data-to-Text Neural Generator

Abstract

In this paper, we introduce a new embedding-based metric relying on trainable ranking models to evaluate the semantic accuracy of neural data-to-text generators. This metric is especially well suited to semantically and factually assess the performance of a text generator when tables can be associated with multiple references and table values contain textual utterances. We first present how one can implement and further specialize the metric by training the underlying ranking models on a legal Data-to-Text dataset. We show how it may provide a more robust evaluation than other evaluation schemes in challenging settings using a dataset comprising paraphrases between the table values and their respective references. Finally, we evaluate its generalization capabilities on a well-known dataset, WebNLG, by comparing it with human evaluation and a recently introduced metric based on natural language inference. We then illustrate how it naturally characterizes, both quantitatively and qualitatively, omissions and hallucinations.

Anthology ID:: 2021.eval4nlp-1.6
Volume:: Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems
Month:: November
Year:: 2021
Address:: Punta Cana, Dominican Republic
Editors:: Yang Gao, Steffen Eger, Wei Zhao, Piyawat Lertvittayakumjorn, Marina Fomicheva
Venue:: Eval4NLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 51–61
Language:
URL:: https://aclanthology.org/2021.eval4nlp-1.6/
DOI:: 10.18653/v1/2021.eval4nlp-1.6
Bibkey:
Cite (ACL):: Nicolas Garneau and Luc Lamontagne. 2021. Trainable Ranking Models to Evaluate the Semantic Accuracy of Data-to-Text Neural Generator. In Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems, pages 51–61, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Trainable Ranking Models to Evaluate the Semantic Accuracy of Data-to-Text Neural Generator (Garneau & Lamontagne, Eval4NLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.eval4nlp-1.6.pdf
Video:: https://aclanthology.org/2021.eval4nlp-1.6.mp4

PDF Cite Search Video Fix data