Machine Translation Pre-training for Data-to-Text Generation - A Case Study in Czech

Mihir Kale, Scott Roy


Abstract
While there is a large body of research studying deep learning methods for text generation from structured data, almost all of it focuses purely on English. In this paper, we study the effectiveness of machine translation based pre-training for data-to-text generation in non-English languages. Since the structured data is generally expressed in English, text generation into other languages involves elements of translation, transliteration and copying - elements already encoded in neural machine translation systems. Moreover, since data-to-text corpora are typically small, this task can benefit greatly from pre-training. We conduct experiments on Czech, a morphologically complex language. Results show that machine translation pre-training lets us train endto-end models that significantly improve upon unsupervised pre-training and linguistically informed pipelined neural systems, as judged by automatic metrics and human evaluation. We also show that this approach enjoys several desirable properties, including improved performance in low data scenarios and applicability to low resource languages.
Anthology ID:
2020.inlg-1.13
Volume:
Proceedings of the 13th International Conference on Natural Language Generation
Month:
December
Year:
2020
Address:
Dublin, Ireland
Editors:
Brian Davis, Yvette Graham, John Kelleher, Yaji Sripada
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
91–96
Language:
URL:
https://aclanthology.org/2020.inlg-1.13
DOI:
10.18653/v1/2020.inlg-1.13
Bibkey:
Cite (ACL):
Mihir Kale and Scott Roy. 2020. Machine Translation Pre-training for Data-to-Text Generation - A Case Study in Czech. In Proceedings of the 13th International Conference on Natural Language Generation, pages 91–96, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Machine Translation Pre-training for Data-to-Text Generation - A Case Study in Czech (Kale & Roy, INLG 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.inlg-1.13.pdf