Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data

Wei-Jen Ko, Ahmed El-Kishky, Adithya Renduchintala, Vishrav Chaudhary, Naman Goyal, Francisco Guzmán, Pascale Fung, Philipp Koehn, Mona Diab


Abstract
The scarcity of parallel data is a major obstacle for training high-quality machine translation systems for low-resource languages. Fortunately, some low-resource languages are linguistically related or similar to high-resource languages; these related languages may share many lexical or syntactic structures. In this work, we exploit this linguistic overlap to facilitate translating to and from a low-resource language with only monolingual data, in addition to any parallel data in the related high-resource language. Our method, NMT-Adapt, combines denoising autoencoding, back-translation and adversarial objectives to utilize monolingual data for low-resource adaptation. We experiment on 7 languages from three different language families and show that our technique significantly improves translation into low-resource language compared to other translation baselines.
Anthology ID:
2021.acl-long.66
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
August
Year:
2021
Address:
Online
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
802–812
Language:
URL:
https://aclanthology.org/2021.acl-long.66
DOI:
10.18653/v1/2021.acl-long.66
Bibkey:
Cite (ACL):
Wei-Jen Ko, Ahmed El-Kishky, Adithya Renduchintala, Vishrav Chaudhary, Naman Goyal, Francisco Guzmán, Pascale Fung, Philipp Koehn, and Mona Diab. 2021. Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 802–812, Online. Association for Computational Linguistics.
Cite (Informal):
Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data (Ko et al., ACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.acl-long.66.pdf
Video:
 https://aclanthology.org/2021.acl-long.66.mp4
Code
 wjko2/NMT-Adapt
Data
CC100FLoRes