How do languages influence each other? Studying cross-lingual data sharing during LM fine-tuning

Rochelle Choenni, Dan Garrette, Ekaterina Shutova


Abstract
Multilingual language models (MLMs) are jointly trained on data from many different languages such that representation of individual languages can benefit from other languages’ data. Impressive performance in zero-shot cross-lingual transfer shows that these models are able to exploit this property. Yet, it remains unclear to what extent, and under which conditions, languages rely on each other’s data. To answer this question, we use TracIn (Pruthi et al., 2020), a training data attribution (TDA) method, to retrieve training samples from multilingual data that are most influential for test predictions in a given language. This allows us to analyse cross-lingual sharing mechanisms of MLMs from a new perspective. While previous work studied cross-lingual sharing at the model parameter level, we present the first approach to study it at the data level. We find that MLMs rely on data from multiple languages during fine-tuning and this reliance increases as fine-tuning progresses. We further find that training samples from other languages can both reinforce and complement the knowledge acquired from data of the test language itself.
Anthology ID:
2023.emnlp-main.818
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13244–13257
Language:
URL:
https://aclanthology.org/2023.emnlp-main.818
DOI:
10.18653/v1/2023.emnlp-main.818
Bibkey:
Cite (ACL):
Rochelle Choenni, Dan Garrette, and Ekaterina Shutova. 2023. How do languages influence each other? Studying cross-lingual data sharing during LM fine-tuning. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13244–13257, Singapore. Association for Computational Linguistics.
Cite (Informal):
How do languages influence each other? Studying cross-lingual data sharing during LM fine-tuning (Choenni et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.818.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.818.mp4