When and Why is Unsupervised Neural Machine Translation Useless?

Yunsu Kim, Miguel Graça, Hermann Ney


Abstract
This paper studies the practicality of the current state-of-the-art unsupervised methods in neural machine translation (NMT). In ten translation tasks with various data settings, we analyze the conditions under which the unsupervised methods fail to produce reasonable translations. We show that their performance is severely affected by linguistic dissimilarity and domain mismatch between source and target monolingual data. Such conditions are common for low-resource language pairs, where unsupervised learning works poorly. In all of our experiments, supervised and semi-supervised baselines with 50k-sentence bilingual data outperform the best unsupervised results. Our analyses pinpoint the limits of the current unsupervised NMT and also suggest immediate research directions.
Anthology ID:
2020.eamt-1.5
Volume:
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
Month:
November
Year:
2020
Address:
Lisboa, Portugal
Venue:
EAMT
SIG:
Publisher:
European Association for Machine Translation
Note:
Pages:
35–44
Language:
URL:
https://aclanthology.org/2020.eamt-1.5
DOI:
Bibkey:
Cite (ACL):
Yunsu Kim, Miguel Graça, and Hermann Ney. 2020. When and Why is Unsupervised Neural Machine Translation Useless?. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, pages 35–44, Lisboa, Portugal. European Association for Machine Translation.
Cite (Informal):
When and Why is Unsupervised Neural Machine Translation Useless? (Kim et al., EAMT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.eamt-1.5.pdf