Francisco Massa


2025

pdf bib
Mixture of Languages: Improved Multilingual Encoders Through Language Grouping
João Maria Janeiro | Belen Alastruey | Francisco Massa | Maha Elbayad | Benjamin Piwowarski | Patrick Gallinari | Loic Barrault
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

We propose Mixture of Languages (MoL), a new strategy to pretrain largely multilingual encoders. Recent work in this field has relied on training transformer encoders on a large amount of multilingual data, with all parameters shared across all languages, without studying how to optimally balance language transfer and interference to achieve better performance. To address this, MoL proposes to group languages based on their similarity, and add parallel, sparsely activated layers that process each group independently. This architecture allows MoL to boost language transfer while minimizing interference, without increasing the active parameter count. We show that MoL largely outperforms a dense counterpart trained with the same configuration, as well as MoE models and public multilingual encoders such as XLM-R or mBERT on downstream tasks.