Tencent’s Multilingual Machine Translation System for WMT22 Large-Scale African Languages

Wenxiang Jiao, Zhaopeng Tu, Jiarui Li, Wenxuan Wang, Jen-tse Huang, Shuming Shi


Abstract
This paper describes Tencent’s multilingual machine translation systems for the WMT22 shared task on Large-Scale Machine Translation Evaluation for African Languages. We participated in the constrained translation track in which only the data and pretrained models provided by the organizer are allowed. The task is challenging due to three problems, including the absence of training data for some to-be-evaluated language pairs, the uneven optimization of language pairs caused by data imbalance, and the curse of multilinguality. To address these problems, we adopt data augmentation, distributionally robust optimization, and language family grouping, respectively, to develop our multilingual neural machine translation (MNMT) models. Our submissions won the 1st place on the blind test sets in terms of the automatic evaluation metrics. Codes, models, and detailed competition results are available at https://github.com/wxjiao/WMT2022-Large-Scale-African.
Anthology ID:
2022.wmt-1.102
Volume:
Proceedings of the Seventh Conference on Machine Translation (WMT)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Editors:
Philipp Koehn, Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Tom Kocmi, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri, Aurélie Névéol, Mariana Neves, Martin Popel, Marco Turchi, Marcos Zampieri
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1049–1056
Language:
URL:
https://aclanthology.org/2022.wmt-1.102
DOI:
Bibkey:
Cite (ACL):
Wenxiang Jiao, Zhaopeng Tu, Jiarui Li, Wenxuan Wang, Jen-tse Huang, and Shuming Shi. 2022. Tencent’s Multilingual Machine Translation System for WMT22 Large-Scale African Languages. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 1049–1056, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Tencent’s Multilingual Machine Translation System for WMT22 Large-Scale African Languages (Jiao et al., WMT 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.wmt-1.102.pdf