DIPLomA: Efficient Adaptation of Instructed LLMs to Low-Resource Languages via Post-Training Delta Merging

Ixak Sarasua Antero; Ander Corral; Xabier Saralegi

DIPLomA: Efficient Adaptation of Instructed LLMs to Low-Resource Languages via Post-Training Delta Merging

Ixak Sarasua Antero, Ander Corral, Xabier Saralegi

Abstract

This paper investigates how open-weight instruction-tuned large language models (LLMs) can be efficiently adapted to low-resource languages without requiring costly large-scale post-training. We introduce DIPLomA (Decoupled Instruction-Preserving Language Adaptation), a lightweight delta-based transfer strategy that provides a practical and effective solution for this scenario. DIPLomA decouples language adaptation from post-training alignment by first continually pretraining a foundational LLM on a modest amount of monolingual target-language data while anchoring on English replay, and then injecting instruction-following capabilities via delta-based weight merging from the instructed counterpart of the base LLM. We evaluate DIPLomA on Basque and validate its generality on Welsh and Swahili, demonstrating consistent and substantial gains in instruction-following, linguistic proficiency, and safety. Compared to strong baselines, our method achieves average relative improvements of 50 points in Basque, 63 in Welsh, and 51 in Swahili, while preserving the original model’s multilingual performance. These results highlight DIPLomA as an effective, resource-efficient strategy for bringing high-quality instruction alignment to underrepresented languages at scale.

Anthology ID:: 2025.findings-emnlp.1355
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 24898–24912
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.1355/
DOI:
Bibkey:
Cite (ACL):: Ixak Sarasua Antero, Ander Corral, and Xabier Saralegi. 2025. DIPLomA: Efficient Adaptation of Instructed LLMs to Low-Resource Languages via Post-Training Delta Merging. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 24898–24912, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: DIPLomA: Efficient Adaptation of Instructed LLMs to Low-Resource Languages via Post-Training Delta Merging (Antero et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.1355.pdf
Checklist:: 2025.findings-emnlp.1355.checklist.pdf

PDF Cite Search Checklist Fix data