Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT

Shijie Wu, Mark Dredze


Abstract
Pretrained contextual representation models (Peters et al., 2018; Devlin et al., 2018) have pushed forward the state-of-the-art on many NLP tasks. A new release of BERT (Devlin, 2018) includes a model simultaneously pretrained on 104 languages with impressive performance for zero-shot cross-lingual transfer on a natural language inference task. This paper explores the broader cross-lingual potential of mBERT (multilingual) as a zero shot language transfer model on 5 NLP tasks covering a total of 39 languages from various language families: NLI, document classification, NER, POS tagging, and dependency parsing. We compare mBERT with the best-published methods for zero-shot cross-lingual transfer and find mBERT competitive on each task. Additionally, we investigate the most effective strategy for utilizing mBERT in this manner, determine to what extent mBERT generalizes away from language specific features, and measure factors that influence cross-lingual transfer.
Anthology ID:
D19-1077
Volume:
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan
Venues:
EMNLP | IJCNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
833–844
Language:
URL:
https://aclanthology.org/D19-1077/
DOI:
10.18653/v1/D19-1077
Bibkey:
Cite (ACL):
Shijie Wu and Mark Dredze. 2019. Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 833–844, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT (Wu & Dredze, EMNLP-IJCNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-1077.pdf
Code
 shijie-wu/crosslingual-nlp +  additional community code
Data
CoNLLCoNLL 2002MLDocXNLI