Tracing the Roots of Facts in Multilingual Language Models: Independent, Shared, and Transferred Knowledge

Xin Zhao; Naoki Yoshinaga; Daisuke Oba

doi:10.18653/v1/2024.eacl-long.127

Tracing the Roots of Facts in Multilingual Language Models: Independent, Shared, and Transferred Knowledge

Abstract

Acquiring factual knowledge for language models (LMs) in low-resource languages poses a serious challenge, thus resorting to cross-lingual transfer in multilingual LMs (ML-LMs). In this study, we ask how ML-LMs acquire and represent factual knowledge. Using the multilingual factual knowledge probing dataset, mLAMA, we first conducted a neuron investigation of ML-LMs (specifically, multilingual BERT). We then traced the roots of facts back to the knowledge source (Wikipedia) to identify the ways in which ML-LMs acquire specific facts. We finally identified three patterns of acquiring and representing facts in ML-LMs: language-independent, cross-lingual shared and transferred, and devised methods for differentiating them. Our findings highlight the challenge of maintaining consistent factual knowledge across languages, underscoring the need for better fact representation learning in ML-LMs.

Anthology ID:: 2024.eacl-long.127
Volume:: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: March
Year:: 2024
Address:: St. Julian’s, Malta
Editors:: Yvette Graham, Matthew Purver
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2088–2102
Language:
URL:: https://aclanthology.org/2024.eacl-long.127/
DOI:: 10.18653/v1/2024.eacl-long.127
Bibkey:
Cite (ACL):: Xin Zhao, Naoki Yoshinaga, and Daisuke Oba. 2024. Tracing the Roots of Facts in Multilingual Language Models: Independent, Shared, and Transferred Knowledge. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2088–2102, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):: Tracing the Roots of Facts in Multilingual Language Models: Independent, Shared, and Transferred Knowledge (Zhao et al., EACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.eacl-long.127.pdf
Video:: https://aclanthology.org/2024.eacl-long.127.mp4

PDF Cite Search Video Fix data