Findings of the WMT 2021 Shared Task on Large-Scale Multilingual Machine Translation

Guillaume Wenzek; Vishrav Chaudhary; Angela Fan; Sahir Gomez; Naman Goyal; Somya Jain; Douwe Kiela; Tristan Thrush; Francisco Guzmán

Findings of the WMT 2021 Shared Task on Large-Scale Multilingual Machine Translation

Guillaume Wenzek, Vishrav Chaudhary, Angela Fan, Sahir Gomez, Naman Goyal, Somya Jain, Douwe Kiela, Tristan Thrush, Francisco Guzmán

Abstract

We present the results of the first task on Large-Scale Multilingual Machine Translation. The task consists on the many-to-many evaluation of a single model across a variety of source and target languages. This year, the task consisted on three different settings: (i) SMALL-TASK1 (Central/South-Eastern European Languages), (ii) the SMALL-TASK2 (South-East Asian Languages), and (iii) FULL-TASK (all 101 x 100 language pairs). All the tasks used the FLORES-101 dataset as the evaluation benchmark. To ensure the longevity of the dataset, the test sets were not publicly released and the models were evaluated in a controlled environment on Dynabench. There were a total of 10 participating teams for the tasks, with a total of 151 intermediate model submissions and 13 final models. This year’s result show a significant improvement over the known base-lines with +17.8 BLEU for SMALL-TASK2, +10.6 for FULL-TASK and +3.6 for SMALL-TASK1.

Anthology ID:: 2021.wmt-1.2
Volume:: Proceedings of the Sixth Conference on Machine Translation
Month:: November
Year:: 2021
Address:: Online
Editors:: Loic Barrault, Ondrej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussa, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Tom Kocmi, Andre Martins, Makoto Morishita, Christof Monz
Venue:: WMT
SIG:: SIGMT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 89–99
Language:
URL:: https://aclanthology.org/2021.wmt-1.2/
DOI:
Bibkey:
Cite (ACL):: Guillaume Wenzek, Vishrav Chaudhary, Angela Fan, Sahir Gomez, Naman Goyal, Somya Jain, Douwe Kiela, Tristan Thrush, and Francisco Guzmán. 2021. Findings of the WMT 2021 Shared Task on Large-Scale Multilingual Machine Translation. In Proceedings of the Sixth Conference on Machine Translation, pages 89–99, Online. Association for Computational Linguistics.
Cite (Informal):: Findings of the WMT 2021 Shared Task on Large-Scale Multilingual Machine Translation (Wenzek et al., WMT 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.wmt-1.2.pdf

PDF Cite Search Fix data