MIGRATE: Cross-Lingual Adaptation of Domain-Specific LLMs through Code-Switching and Embedding Transfer

Seongtae Hong; Seungyoon Lee; Hyeonseok Moon; Heui-Seok Lim

MIGRATE: Cross-Lingual Adaptation of Domain-Specific LLMs through Code-Switching and Embedding Transfer

Seongtae Hong, Seungyoon Lee, Hyeonseok Moon, Heuiseok Lim

Abstract

Large Language Models (LLMs) have rapidly advanced, with domain-specific expert models emerging to handle specialized tasks across various fields. However, the predominant focus on English-centric models demands extensive data, making it challenging to develop comparable models for middle and low-resource languages. To address this limitation, we introduce Migrate, a novel method that leverages open-source static embedding models and up to 3 million tokens of code-switching data to facilitate the seamless transfer of embeddings to target languages. Migrate enables effective cross-lingual adaptation without requiring large-scale domain-specific corpora in the target language, promoting the accessibility of expert LLMs to a diverse range of linguistic communities. Our experimental results demonstrate that Migrate significantly enhances model performance in target languages, outperforming baseline and existing cross-lingual transfer methods. This approach provides a practical and efficient solution for extending the capabilities of domain-specific expert models.

Anthology ID:: 2025.coling-main.617
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9184–9193
Language:
URL:: https://aclanthology.org/2025.coling-main.617/
DOI:
Bibkey:
Cite (ACL):: Seongtae Hong, Seungyoon Lee, Hyeonseok Moon, and Heuiseok Lim. 2025. MIGRATE: Cross-Lingual Adaptation of Domain-Specific LLMs through Code-Switching and Embedding Transfer. In Proceedings of the 31st International Conference on Computational Linguistics, pages 9184–9193, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: MIGRATE: Cross-Lingual Adaptation of Domain-Specific LLMs through Code-Switching and Embedding Transfer (Hong et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.617.pdf

PDF Cite Search Fix data