A Simplified Training Pipeline for Low-Resource and Unsupervised Machine Translation

Àlex R. Atrio; Alexis Allemann; Ljiljana Dolamic; Andrei Popescu-Belis

doi:10.18653/v1/2023.loresmt-1.4

A Simplified Training Pipeline for Low-Resource and Unsupervised Machine Translation

Àlex R. Atrio, Alexis Allemann, Ljiljana Dolamic, Andrei Popescu-Belis

Abstract

Training neural MT systems for low-resource language pairs or in unsupervised settings (i.e. with no parallel data) often involves a large number of auxiliary systems. These may include parent systems trained on higher-resource pairs and used for initializing the parameters of child systems, multilingual systems for neighboring languages, and several stages of systems trained on pseudo-parallel data obtained through back-translation. We propose here a simplified pipeline, which we compare to the best submissions to the WMT 2021 Shared Task on Unsupervised MT and Very Low Resource Supervised MT. Our pipeline only needs two parents, two children, one round of back-translation for low-resource directions and two for unsupervised ones and obtains better or similar scores when compared to more complex alternatives.

Anthology ID:: 2023.loresmt-1.4
Volume:: Proceedings of the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023)
Month:: May
Year:: 2023
Address:: Dubrovnik, Croatia
Editors:: Atul Kr. Ojha, Chao-hong Liu, Ekaterina Vylomova, Flammie Pirinen, Jade Abbott, Jonathan Washington, Nathaniel Oco, Valentin Malykh, Varvara Logacheva, Xiaobing Zhao
Venue:: LoResMT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 47–58
Language:
URL:: https://aclanthology.org/2023.loresmt-1.4/
DOI:: 10.18653/v1/2023.loresmt-1.4
Bibkey:
Cite (ACL):: Àlex R. Atrio, Alexis Allemann, Ljiljana Dolamic, and Andrei Popescu-Belis. 2023. A Simplified Training Pipeline for Low-Resource and Unsupervised Machine Translation. In Proceedings of the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023), pages 47–58, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):: A Simplified Training Pipeline for Low-Resource and Unsupervised Machine Translation (Atrio et al., LoResMT 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.loresmt-1.4.pdf
Video:: https://aclanthology.org/2023.loresmt-1.4.mp4

PDF Cite Search Video Fix data