IOL Research Machine Translation Systems for WMT23 Low-Resource Indic Language Translation Shared Task

Wenbo Zhang

doi:10.18653/v1/2023.wmt-1.94

IOL Research Machine Translation Systems for WMT23 Low-Resource Indic Language Translation Shared Task

Abstract

This paper describes the IOL Research team’s submission systems for the WMT23 low-resource Indic language translation shared task. We participated in 4 language pairs, including en-as, en-mz, en-kha, en-mn. We use transformer based neural network architecture to train our machine translation models. Overall, the core of our system is to improve the quality of low resource translation by utilizing monolingual data through pre-training and data augmentation. We first trained two denoising language models similar to T5 and BART using monolingual data, and then used parallel data to fine-tune the pretrained language models to obtain two multilingual machine translation models. The multilingual machine translation models can be used to translate English monolingual data into other multilingual data, forming multilingual parallel data as augmented data. We trained multiple translation models from scratch using augmented data and real parallel data to build the final submission systems by model ensemble. Experimental results show that our method greatly improves the BLEU scores for translation of these four language pairs.

Anthology ID:: 2023.wmt-1.94
Volume:: Proceedings of the Eighth Conference on Machine Translation
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
Venue:: WMT
SIG:: SIGMT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 978–982
Language:
URL:: https://aclanthology.org/2023.wmt-1.94
DOI:: 10.18653/v1/2023.wmt-1.94
Bibkey:
Cite (ACL):: Wenbo Zhang. 2023. IOL Research Machine Translation Systems for WMT23 Low-Resource Indic Language Translation Shared Task. In Proceedings of the Eighth Conference on Machine Translation, pages 978–982, Singapore. Association for Computational Linguistics.
Cite (Informal):: IOL Research Machine Translation Systems for WMT23 Low-Resource Indic Language Translation Shared Task (Zhang, WMT 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.wmt-1.94.pdf

PDF Cite Search