Analysing Data-To-Text Generation Benchmarks

Laura Perez-Beltrachini, Claire Gardent


Abstract
A generation system can only be as good as the data it is trained on. In this short paper, we propose a methodology for analysing data-to-text corpora used for training Natural Language Generation (NLG) systems. We apply this methodology to three existing benchmarks. We conclude by eliciting a set of criteria for the creation of a data-to-text benchmark which could help better support the development, evaluation and comparison of linguistically sophisticated data-to-text generators.
Anthology ID:
W17-3537
Volume:
Proceedings of the 10th International Conference on Natural Language Generation
Month:
September
Year:
2017
Address:
Santiago de Compostela, Spain
Venues:
INLG | WS
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
238–242
Language:
URL:
https://aclanthology.org/W17-3537
DOI:
10.18653/v1/W17-3537
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/W17-3537.pdf