Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model

Chong Li; Yingzhuo Deng; Jiajun Zhang; Chengqing Zong

doi:10.18653/v1/2025.findings-acl.87

Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model

Chong Li, Yingzhuo Deng, Jiajun Zhang, Chengqing Zong

Abstract

The curse of multilinguality phenomenon is a fundamental problem of multilingual Large Language Models (LLMs), where the competition between massive languages results in inferior performance. It mainly comes from limited capacity and negative transfer between dissimilar languages. To address this issue, we propose a method to dynamically group and scale up the parameters of multilingual LLM while boosting positive transfer among similar languages. Specifically, the model is first tuned on monolingual corpus to determine the parameter deviation in each layer and quantify the similarity between languages. Layers with more deviations are extended to mixture-of-experts layers to reduce competition between languages, where one expert module serves one group of similar languages. Experimental results on 18 to 128 languages show that our method reduces the negative transfer between languages and significantly boosts multilingual performance with fewer parameters. Such language group specialization on experts benefits the new language adaptation and reduces the inference on the previous multilingual knowledge learned.

Anthology ID:: 2025.findings-acl.87
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1730–1754
Language:
URL:: https://aclanthology.org/2025.findings-acl.87/
DOI:: 10.18653/v1/2025.findings-acl.87
Bibkey:
Cite (ACL):: Chong Li, Yingzhuo Deng, Jiajun Zhang, and Chengqing Zong. 2025. Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model. In Findings of the Association for Computational Linguistics: ACL 2025, pages 1730–1754, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model (Li et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.87.pdf

PDF Cite Search Fix data