How Many Languages Make Good Multilingual Instruction Tuning? A Case Study on BLOOM

Shaoxiong Ji, Pinzhen Chen


Abstract
Instruction tuning a large language model with multiple languages can prepare it for multilingual downstream tasks. Nonetheless, it is yet to be determined whether having a handful of languages is sufficient, or whether the benefits increase with the inclusion of more. By fine-tuning large multilingual models on 1 to 52 languages, we present a case study on BLOOM to understand three pertinent factors affecting performance: the number of languages, language exposure, and similarity between training and test languages. Overall we found that 1) expanding language coverage in multilingual instruction tuning proves to be beneficial; 2) accuracy often significantly boots if the test language appears in the instruction mixture; 3) languages’ genetic features correlate with cross-lingual transfer more than merely the number of language but different languages benefit to various degrees.
Anthology ID:
2025.coling-main.175
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2575–2581
Language:
URL:
https://aclanthology.org/2025.coling-main.175/
DOI:
Bibkey:
Cite (ACL):
Shaoxiong Ji and Pinzhen Chen. 2025. How Many Languages Make Good Multilingual Instruction Tuning? A Case Study on BLOOM. In Proceedings of the 31st International Conference on Computational Linguistics, pages 2575–2581, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
How Many Languages Make Good Multilingual Instruction Tuning? A Case Study on BLOOM (Ji & Chen, COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.175.pdf