Model Selection for Cross-lingual Transfer

Yang Chen, Alan Ritter


Abstract
Transformers that are pre-trained on multilingual corpora, such as, mBERT and XLM-RoBERTa, have achieved impressive cross-lingual transfer capabilities. In the zero-shot transfer setting, only English training data is used, and the fine-tuned model is evaluated on another target language. While this works surprisingly well, substantial variance has been observed in target language performance between different fine-tuning runs, and in the zero-shot setup, no target-language development data is available to select among multiple fine-tuned models. Prior work has relied on English dev data to select among models that are fine-tuned with different learning rates, number of steps and other hyperparameters, often resulting in suboptimal choices. In this paper, we show that it is possible to select consistently better models when small amounts of annotated data are available in auxiliary pivot languages. We propose a machine learning approach to model selection that uses the fine-tuned model’s own internal representations to predict its cross-lingual capabilities. In extensive experiments we find that this method consistently selects better models than English validation data across twenty five languages (including eight low-resource languages), and often achieves results that are comparable to model selection using target language development data.
Anthology ID:
2021.emnlp-main.459
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5675–5687
Language:
URL:
https://aclanthology.org/2021.emnlp-main.459
DOI:
10.18653/v1/2021.emnlp-main.459
Bibkey:
Cite (ACL):
Yang Chen and Alan Ritter. 2021. Model Selection for Cross-lingual Transfer. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5675–5687, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Model Selection for Cross-lingual Transfer (Chen & Ritter, EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.459.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.459.mp4
Code
 edchengg/model_selection