Handling Rare Items in Data-to-Text Generation

Anastasia Shimorina, Claire Gardent


Abstract
Neural approaches to data-to-text generation generally handle rare input items using either delexicalisation or a copy mechanism. We investigate the relative impact of these two methods on two datasets (E2E and WebNLG) and using two evaluation settings. We show (i) that rare items strongly impact performance; (ii) that combining delexicalisation and copying yields the strongest improvement; (iii) that copying underperforms for rare and unseen items and (iv) that the impact of these two mechanisms greatly varies depending on how the dataset is constructed and on how it is split into train, dev and test.
Anthology ID:
W18-6543
Volume:
Proceedings of the 11th International Conference on Natural Language Generation
Month:
November
Year:
2018
Address:
Tilburg University, The Netherlands
Editors:
Emiel Krahmer, Albert Gatt, Martijn Goudbeek
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
360–370
Language:
URL:
https://aclanthology.org/W18-6543/
DOI:
10.18653/v1/W18-6543
Bibkey:
Cite (ACL):
Anastasia Shimorina and Claire Gardent. 2018. Handling Rare Items in Data-to-Text Generation. In Proceedings of the 11th International Conference on Natural Language Generation, pages 360–370, Tilburg University, The Netherlands. Association for Computational Linguistics.
Cite (Informal):
Handling Rare Items in Data-to-Text Generation (Shimorina & Gardent, INLG 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-6543.pdf
Code
 shimorina/webnlg-dataset +  additional community code
Data
WebNLG