How Multilingual is Multilingual BERT?

Telmo Pires, Eva Schlinger, Dan Garrette


Abstract
In this paper, we show that Multilingual BERT (M-BERT), released by Devlin et al. (2018) as a single language model pre-trained from monolingual corpora in 104 languages, is surprisingly good at zero-shot cross-lingual model transfer, in which task-specific annotations in one language are used to fine-tune the model for evaluation in another language. To understand why, we present a large number of probing experiments, showing that transfer is possible even to languages in different scripts, that transfer works best between typologically similar languages, that monolingual corpora can train models for code-switching, and that the model can find translation pairs. From these results, we can conclude that M-BERT does create multilingual representations, but that these representations exhibit systematic deficiencies affecting certain language pairs.
Anthology ID:
P19-1493
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4996–5001
Language:
URL:
https://aclanthology.org/P19-1493
DOI:
10.18653/v1/P19-1493
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/P19-1493.pdf
Video:
 https://vimeo.com/385218970
Code
 additional community code
Data
Universal Dependencies