Exploring the impact of data representation on neural data-to-text generation

David M. Howcroft, Lewis N. Watson, Olesia Nedopas, Dimitra Gkatzia


Abstract
A relatively under-explored area in research on neural natural language generation is the impact of the data representation on text quality. Here we report experiments on two leading input representations for data-to-text generation: attribute-value pairs and Resource Description Framework (RDF) triples. Evaluating the performance of encoder-decoder seq2seq models as well as recent large language models (LLMs) with both automated metrics and human evaluation, we find that the input representation does not seem to have a large impact on the performance of either purpose-built seq2seq models or LLMs. Finally, we present an error analysis of the texts generated by the LLMs and provide some insights into where these models fail.
Anthology ID:
2024.inlg-main.20
Volume:
Proceedings of the 17th International Natural Language Generation Conference
Month:
September
Year:
2024
Address:
Tokyo, Japan
Editors:
Saad Mahamood, Nguyen Le Minh, Daphne Ippolito
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
243–253
Language:
URL:
https://aclanthology.org/2024.inlg-main.20
DOI:
Bibkey:
Cite (ACL):
David M. Howcroft, Lewis N. Watson, Olesia Nedopas, and Dimitra Gkatzia. 2024. Exploring the impact of data representation on neural data-to-text generation. In Proceedings of the 17th International Natural Language Generation Conference, pages 243–253, Tokyo, Japan. Association for Computational Linguistics.
Cite (Informal):
Exploring the impact of data representation on neural data-to-text generation (Howcroft et al., INLG 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.inlg-main.20.pdf
Supplementary attachment:
 2024.inlg-main.20.Supplementary_Attachment.pdf