Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models

Terra Blevins; Hila Gonen; Luke Zettlemoyer

doi:10.18653/v1/2022.emnlp-main.234

Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models

Terra Blevins, Hila Gonen, Luke Zettlemoyer

Abstract

The emergent cross-lingual transfer seen in multilingual pretrained models has sparked significant interest in studying their behavior. However, because these analyses have focused on fully trained multilingual models, little is known about the dynamics of the multilingual pretraining process. We investigate when these models acquire their in-language and cross-lingual abilities by probing checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks. Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones. In contrast, the point in pretraining when the model learns to transfer cross-lingually differs across language pairs. Interestingly, we also observe that, across many languages and tasks, the final model layer exhibits significant performance degradation over time, while linguistic knowledge propagates to lower layers of the network. Taken together, these insights highlight the complexity of multilingual pretraining and the resulting varied behavior for different languages over time.

Anthology ID:: 2022.emnlp-main.234
Volume:: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3575–3590
Language:
URL:: https://aclanthology.org/2022.emnlp-main.234/
DOI:: 10.18653/v1/2022.emnlp-main.234
Bibkey:
Cite (ACL):: Terra Blevins, Hila Gonen, and Luke Zettlemoyer. 2022. Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3575–3590, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models (Blevins et al., EMNLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.emnlp-main.234.pdf

PDF Cite Search Fix data