Jiuyi Li
2023
Continual Learning for Multilingual Neural Machine Translation via Dual Importance-based Model Division
Junpeng Liu
|
Kaiyu Huang
|
Hao Yu
|
Jiuyi Li
|
Jinsong Su
|
Degen Huang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
A persistent goal of multilingual neural machine translation (MNMT) is to continually adapt the model to support new language pairs or improve some current language pairs without accessing the previous training data. To achieve this, the existing methods primarily focus on preventing catastrophic forgetting by making compromises between the original and new language pairs, leading to sub-optimal performance on both translation tasks. To mitigate this problem, we propose a dual importance-based model division method to divide the model parameters into two parts and separately model the translation of the original and new tasks. Specifically, we first remove the parameters that are negligible to the original tasks but essential to the new tasks to obtain a pruned model, which is responsible for the original translation tasks. Then we expand the pruned model with external parameters and fine-tune the newly added parameters with new training data. The whole fine-tuned model will be used for the new translation tasks. Experimental results show that our method can efficiently adapt the original model to various new translation tasks while retaining the performance of the original tasks. Further analyses demonstrate that our method consistently outperforms several strong baselines under different incremental translation scenarios.
2022
Adaptive Token-level Cross-lingual Feature Mixing for Multilingual Neural Machine Translation
Junpeng Liu
|
Kaiyu Huang
|
Jiuyi Li
|
Huan Liu
|
Jinsong Su
|
Degen Huang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Multilingual neural machine translation aims to translate multiple language pairs in a single model and has shown great success thanks to the knowledge transfer across languages with the shared parameters. Despite promising, this share-all paradigm suffers from insufficient ability to capture language-specific features. Currently, the common practice is to insert or search language-specific networks to balance the shared and specific features. However, those two types of features are not sufficient enough to model the complex commonality and divergence across languages, such as the locally shared features among similar languages, which leads to sub-optimal transfer, especially in massively multilingual translation. In this paper, we propose a novel token-level feature mixing method that enables the model to capture different features and dynamically determine the feature sharing across languages. Based on the observation that the tokens in the multilingual model are usually shared by different languages, we we insert a feature mixing layer into each Transformer sublayer and model each token representation as a mix of different features, with a proportion indicating its feature preference. In this way, we can perform fine-grained feature sharing and achieve better multilingual transfer. Experimental results on multilingual datasets show that our method outperforms various strong baselines and can be extended to zero-shot translation. Further analyses reveal that our method can capture different linguistic features and bridge the representation gap across languages.
Search
Fix data
Co-authors
- Kaiyu Huang 2
- Degen Huang 2
- Junpeng Liu 2
- Jinsong Su 2
- Huan Liu 1
- show all...
- Hao Yu 1