%0 Conference Proceedings
%T Enriching the E2E dataset
%A Castro Ferreira, Thiago
%A Vaz, Helena
%A Davis, Brian
%A Pagano, Adriana
%Y Belz, Anya
%Y Fan, Angela
%Y Reiter, Ehud
%Y Sripada, Yaji
%S Proceedings of the 14th International Conference on Natural Language Generation
%D 2021
%8 August
%I Association for Computational Linguistics
%C Aberdeen, Scotland, UK
%F castro-ferreira-etal-2021-enriching
%X This study introduces an enriched version of the E2E dataset, one of the most popular language resources for data-to-text NLG. We extract intermediate representations for popular pipeline tasks such as discourse ordering, text structuring, lexicalization and referring expression generation, enabling researchers to rapidly develop and evaluate their data-to-text pipeline systems. The intermediate representations are extracted by aligning non-linguistic and text representations through a process called delexicalization, which consists in replacing input referring expressions to entities/attributes with placeholders. The enriched dataset is publicly available.
%R 10.18653/v1/2021.inlg-1.18
%U https://aclanthology.org/2021.inlg-1.18
%U https://doi.org/10.18653/v1/2021.inlg-1.18
%P 177-183