Identifying Elements Essential for BERT’s Multilinguality

Philipp Dufter, Hinrich Schütze


Abstract
It has been shown that multilingual BERT (mBERT) yields high quality multilingual representations and enables effective zero-shot transfer. This is surprising given that mBERT does not use any crosslingual signal during training. While recent literature has studied this phenomenon, the reasons for the multilinguality are still somewhat obscure. We aim to identify architectural properties of BERT and linguistic properties of languages that are necessary for BERT to become multilingual. To allow for fast experimentation we propose an efficient setup with small BERT models trained on a mix of synthetic and natural data. Overall, we identify four architectural and two linguistic elements that influence multilinguality. Based on our insights, we experiment with a multilingual pretraining setup that modifies the masking strategy using VecMap, i.e., unsupervised embedding alignment. Experiments on XNLI with three languages indicate that our findings transfer from our small setup to larger scale settings.
Anthology ID:
2020.emnlp-main.358
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Editors:
Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4423–4437
Language:
URL:
https://aclanthology.org/2020.emnlp-main.358
DOI:
10.18653/v1/2020.emnlp-main.358
Bibkey:
Cite (ACL):
Philipp Dufter and Hinrich Schütze. 2020. Identifying Elements Essential for BERT’s Multilinguality. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4423–4437, Online. Association for Computational Linguistics.
Cite (Informal):
Identifying Elements Essential for BERT’s Multilinguality (Dufter & Schütze, EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.358.pdf
Video:
 https://slideslive.com/38938893