Towards Continual Learning for Multilingual Machine Translation via Vocabulary Substitution

Xavier Garcia, Noah Constant, Ankur Parikh, Orhan Firat


Abstract
We propose a straightforward vocabulary adaptation scheme to extend the language capacity of multilingual machine translation models, paving the way towards efficient continual learning for multilingual machine translation. Our approach is suitable for large-scale datasets, applies to distant languages with unseen scripts, incurs only minor degradation on the translation performance for the original language pairs and provides competitive performance even in the case where we only possess monolingual data for the new languages.
Anthology ID:
2021.naacl-main.93
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1184–1192
Language:
URL:
https://aclanthology.org/2021.naacl-main.93
DOI:
10.18653/v1/2021.naacl-main.93
Bibkey:
Cite (ACL):
Xavier Garcia, Noah Constant, Ankur Parikh, and Orhan Firat. 2021. Towards Continual Learning for Multilingual Machine Translation via Vocabulary Substitution. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1184–1192, Online. Association for Computational Linguistics.
Cite (Informal):
Towards Continual Learning for Multilingual Machine Translation via Vocabulary Substitution (Garcia et al., NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.93.pdf
Video:
 https://aclanthology.org/2021.naacl-main.93.mp4