On the Impact of Various Types of Noise on Neural Machine Translation

Huda Khayrallah, Philipp Koehn


Abstract
We examine how various types of noise in the parallel training data impact the quality of neural machine translation systems. We create five types of artificial noise and analyze how they degrade performance in neural and statistical machine translation. We find that neural models are generally more harmed by noise than statistical models. For one especially egregious type of noise they learn to just copy the input sentence.
Anthology ID:
W18-2709
Volume:
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Alexandra Birch, Andrew Finch, Thang Luong, Graham Neubig, Yusuke Oda
Venue:
NGT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
74–83
Language:
URL:
https://aclanthology.org/W18-2709/
DOI:
10.18653/v1/W18-2709
Bibkey:
Cite (ACL):
Huda Khayrallah and Philipp Koehn. 2018. On the Impact of Various Types of Noise on Neural Machine Translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, pages 74–83, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
On the Impact of Various Types of Noise on Neural Machine Translation (Khayrallah & Koehn, NGT 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-2709.pdf