Generating High-Quality Surface Realizations Using Data Augmentation and Factored Sequence Models

Henry Elder, Chris Hokamp


Abstract
This work presents state of the art results in reconstruction of surface realizations from obfuscated text. We identify the lack of sufficient training data as the major obstacle to training high-performing models, and solve this issue by generating large amounts of synthetic training data. We also propose preprocessing techniques which make the structure contained in the input features more accessible to sequence models. Our models were ranked first on all evaluation metrics in the English portion of the 2018 Surface Realization shared task.
Anthology ID:
W18-3606
Volume:
Proceedings of the First Workshop on Multilingual Surface Realisation
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Simon Mille, Anja Belz, Bernd Bohnet, Emily Pitler, Leo Wanner
Venue:
ACL
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
49–53
Language:
URL:
https://aclanthology.org/W18-3606/
DOI:
10.18653/v1/W18-3606
Bibkey:
Cite (ACL):
Henry Elder and Chris Hokamp. 2018. Generating High-Quality Surface Realizations Using Data Augmentation and Factored Sequence Models. In Proceedings of the First Workshop on Multilingual Surface Realisation, pages 49–53, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Generating High-Quality Surface Realizations Using Data Augmentation and Factored Sequence Models (Elder & Hokamp, ACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-3606.pdf