The E2E Dataset: New Challenges For End-to-End Generation

Jekaterina Novikova, Ondřej Dušek, Verena Rieser


Abstract
This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area. The E2E dataset poses new challenges: (1) its human reference texts show more lexical richness and syntactic variation, including discourse phenomena; (2) generating from this set requires content selection. As such, learning from this dataset promises more natural, varied and less template-like system utterances. We also establish a baseline on this dataset, which illustrates some of the difficulties associated with this data.
Anthology ID:
W17-5525
Volume:
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue
Month:
August
Year:
2017
Address:
Saarbrücken, Germany
Venue:
SIGDIAL
SIG:
SIGDIAL
Publisher:
Association for Computational Linguistics
Note:
Pages:
201–206
Language:
URL:
https://aclanthology.org/W17-5525
DOI:
10.18653/v1/W17-5525
Bibkey:
Cite (ACL):
Jekaterina Novikova, Ondřej Dušek, and Verena Rieser. 2017. The E2E Dataset: New Challenges For End-to-End Generation. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pages 201–206, Saarbrücken, Germany. Association for Computational Linguistics.
Cite (Informal):
The E2E Dataset: New Challenges For End-to-End Generation (Novikova et al., SIGDIAL 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-5525.pdf
Code
 additional community code
Data
E2E