mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

Linting Xue; Noah Constant; Adam Roberts; Mihir Kale; Rami Al-Rfou’; Aditya Siddhant; Aditya Barua; Colin Raffel

doi:10.18653/v1/2021.naacl-main.41

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel

Abstract

The recent “Text-to-Text Transfer Transformer” (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent “accidental translation” in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available.

Anthology ID:: 2021.naacl-main.41
Volume:: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:: June
Year:: 2021
Address:: Online
Editors:: Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 483–498
Language:
URL:: https://aclanthology.org/2021.naacl-main.41
DOI:: 10.18653/v1/2021.naacl-main.41
Bibkey:
Cite (ACL):: Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, and Colin Raffel. 2021. mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Online. Association for Computational Linguistics.
Cite (Informal):: mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer (Xue et al., NAACL 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.naacl-main.41.pdf
Video:: https://aclanthology.org/2021.naacl-main.41.mp4
Code: google-research/multilingual-t5 + additional community code
Data: mC4, C4, DaNetQA, LiDiRus, MLQA, MuSeRC, PARus, PAWS-X, RCB, RWSD, RuCoS, SQuAD, TERRa, XQuAD, XTREME

PDF Cite Search Code Video