Analysing Data-To-Text Generation Benchmarks

Laura Perez-Beltrachini, Claire Gardent


Abstract
A generation system can only be as good as the data it is trained on. In this short paper, we propose a methodology for analysing data-to-text corpora used for training Natural Language Generation (NLG) systems. We apply this methodology to three existing benchmarks. We conclude by eliciting a set of criteria for the creation of a data-to-text benchmark which could help better support the development, evaluation and comparison of linguistically sophisticated data-to-text generators.
Anthology ID:
W17-3537
Volume:
Proceedings of the 10th International Conference on Natural Language Generation
Month:
September
Year:
2017
Address:
Santiago de Compostela, Spain
Editors:
Jose M. Alonso, Alberto Bugarín, Ehud Reiter
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
238–242
Language:
URL:
https://aclanthology.org/W17-3537/
DOI:
10.18653/v1/W17-3537
Bibkey:
Cite (ACL):
Laura Perez-Beltrachini and Claire Gardent. 2017. Analysing Data-To-Text Generation Benchmarks. In Proceedings of the 10th International Conference on Natural Language Generation, pages 238–242, Santiago de Compostela, Spain. Association for Computational Linguistics.
Cite (Informal):
Analysing Data-To-Text Generation Benchmarks (Perez-Beltrachini & Gardent, INLG 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-3537.pdf