A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems

Craig Thomson, Ehud Reiter


Abstract
Most Natural Language Generation systems need to produce accurate texts. We propose a methodology for high-quality human evaluation of the accuracy of generated texts, which is intended to serve as a gold-standard for accuracy evaluations of data-to-text systems. We use our methodology to evaluate the accuracy of computer generated basketball summaries. We then show how our gold standard evaluation can be used to validate automated metrics.
Anthology ID:
2020.inlg-1.22
Volume:
Proceedings of the 13th International Conference on Natural Language Generation
Month:
December
Year:
2020
Address:
Dublin, Ireland
Editors:
Brian Davis, Yvette Graham, John Kelleher, Yaji Sripada
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
158–168
Language:
URL:
https://aclanthology.org/2020.inlg-1.22
DOI:
10.18653/v1/2020.inlg-1.22
Bibkey:
Cite (ACL):
Craig Thomson and Ehud Reiter. 2020. A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems. In Proceedings of the 13th International Conference on Natural Language Generation, pages 158–168, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems (Thomson & Reiter, INLG 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.inlg-1.22.pdf
Code
 nlgcat/evaluating_accuracy