Tagged Back-Translation

Isaac Caswell, Ciprian Chelba, David Grangier


Abstract
Recent work in Neural Machine Translation (NMT) has shown significant quality gains from noised-beam decoding during back-translation, a method to generate synthetic parallel data. We show that the main role of such synthetic noise is not to diversify the source side, as previously suggested, but simply to indicate to the model that the given source is synthetic. We propose a simpler alternative to noising techniques, consisting of tagging back-translated source sentences with an extra token. Our results on WMT outperform noised back-translation in English-Romanian and match performance on English-German, redefining the state-of-the-art on the former.
Anthology ID:
W19-5206
Volume:
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Marco Turchi, Karin Verspoor
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
53–63
Language:
URL:
https://aclanthology.org/W19-5206
DOI:
10.18653/v1/W19-5206
Bibkey:
Cite (ACL):
Isaac Caswell, Ciprian Chelba, and David Grangier. 2019. Tagged Back-Translation. In Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers), pages 53–63, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Tagged Back-Translation (Caswell et al., WMT 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-5206.pdf