The NiuTrans Machine Translation Systems for WMT22
Weiqiao Shan | Zhiquan Cao | Yuchen Han | Siming Wu | Yimin Hu | Jie Wang | Yi Zhang | Hou Baoyu | Hang Cao | Chenghao Gao | Xiaowen Liu | Tong Xiao | Anxiang Ma | Jingbo Zhu
Proceedings of the Seventh Conference on Machine Translation (WMT)
This paper describes the NiuTrans neural machine translation systems of the WMT22 General MT constrained task. We participate in four directions, including Chinese→English, English→Croatian, and Livonian↔English. Our models are based on several advanced Transformer variants, e.g., Transformer-ODE, Universal Multiscale Transformer (UMST). The main workflow consists of data filtering, large-scale data augmentation (i.e., iterative back-translation, iterative knowledge distillation), and specific-domain fine-tuning. Moreover, we try several multi-domain methods, such as a multi-domain model structure and a multi-domain data clustering method, to rise to this year’s newly proposed multi-domain test set challenge. For low-resource scenarios, we build a multi-language translation model to enhance the performance, and try to use the pre-trained language model (mBERT) to initialize the translation model.
The NiuTrans Machine Translation Systems for WMT20
Yuhao Zhang | Ziyang Wang | Runzhe Cao | Binghao Wei | Weiqiao Shan | Shuhan Zhou | Abudurexiti Reheman | Tao Zhou | Xin Zeng | Laohu Wang | Yongyu Mu | Jingnan Zhang | Xiaoqian Liu | Xuanjun Zhou | Yinqiao Li | Bei Li | Tong Xiao | Jingbo Zhu
Proceedings of the Fifth Conference on Machine Translation
This paper describes NiuTrans neural machine translation systems of the WMT20 news translation tasks. We participated in Japanese<->English, English->Chinese, Inuktitut->English and Tamil->English total five tasks and rank first in Japanese<->English both sides. We mainly utilized iterative back-translation, different depth and widen model architectures, iterative knowledge distillation and iterative fine-tuning. And we find that adequately widened and deepened the model simultaneously, the performance will significantly improve. Also, iterative fine-tuning strategy we implemented is effective during adapting domain. For Inuktitut->English and Tamil->English tasks, we built multilingual models separately and employed pretraining word embedding to obtain better performance.
- Tong Xiao 2
- Jingbo Zhu 2
- Yuhao Zhang 1
- Ziyang Wang 1
- Runzhe Cao 1
- show all...