XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation

Sebastian Ruder, Noah Constant, Jan Botha, Aditya Siddhant, Orhan Firat, Jinlan Fu, Pengfei Liu, Junjie Hu, Dan Garrette, Graham Neubig, Melvin Johnson


Abstract
Machine learning has brought striking advances in multilingual natural language processing capabilities over the past year. For example, the latest techniques have improved the state-of-the-art performance on the XTREME multilingual benchmark by more than 13 points. While a sizeable gap to human-level performance remains, improvements have been easier to achieve in some tasks than in others. This paper analyzes the current state of cross-lingual transfer learning and summarizes some lessons learned. In order to catalyze meaningful progress, we extend XTREME to XTREME-R, which consists of an improved set of ten natural language understanding tasks, including challenging language-agnostic retrieval tasks, and covers 50 typologically diverse languages. In addition, we provide a massively multilingual diagnostic suite and fine-grained multi-dataset evaluation capabilities through an interactive public leaderboard to gain a better understanding of such models.
Anthology ID:
2021.emnlp-main.802
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10215–10245
Language:
URL:
https://aclanthology.org/2021.emnlp-main.802
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.802.pdf
Code
 google-research/xtreme
Data
COPALAReQAMLQAMewsli-9SQuADSuperGLUETatoebaTyDi QATyDiQA-GoldPXCOPAXNLIXQuAD