BTS: Harmonizing Specialized Experts into a Generalist LLM

Qizhen Zhang; Prajjwal Bhargava; Chloe Bi; Chris X. Cai; Jakob Nicolaus Foerster; Jeremy Fu; Punit Singh Koura; Ruan Silva; Sheng Shen; Emily Dinan; Suchin Gururangan; Mike Lewis

doi:10.18653/v1/2025.emnlp-main.347

BTS: Harmonizing Specialized Experts into a Generalist LLM

Qizhen Zhang, Prajjwal Bhargava, Chloe Bi, Chris X. Cai, Jakob Nicolaus Foerster, Jeremy Fu, Punit Singh Koura, Ruan Silva, Sheng Shen, Emily Dinan, Suchin Gururangan, Mike Lewis

Abstract

We present Branch-Train-Stitch (BTS), an efficient and flexible training algorithm for combining independently trained large language model (LLM) experts into a single, capable generalist model. Following Li et al., we start with a single seed language model which is branched into domain-specific (e.g., coding or math) experts with continual pretraining. BTS combines experts into a generalist model using lightweight stitch layers, which are inserted between frozen experts and the seed LLM, and trained on a small datamix of the expert domains. Stitch layers enable the seed LLM to integrate representations from any number of experts during the forward pass, allowing it to generalize to new domains, despite remaining frozen. Because BTS does not alter the constituent LLMs, BTS provides a modular and flexible approach: experts can be easily removed and new experts can be added with only a small amount of training. Compared to alternative model merging approaches, BTS yields the best generalist performance on a variety of downstream tasks, retaining the specialized capabilities of each of the experts.

Anthology ID:: 2025.emnlp-main.347
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6816–6834
Language:
URL:: https://aclanthology.org/2025.emnlp-main.347/
DOI:: 10.18653/v1/2025.emnlp-main.347
Bibkey:
Cite (ACL):: Qizhen Zhang, Prajjwal Bhargava, Chloe Bi, Chris X. Cai, Jakob Nicolaus Foerster, Jeremy Fu, Punit Singh Koura, Ruan Silva, Sheng Shen, Emily Dinan, Suchin Gururangan, and Mike Lewis. 2025. BTS: Harmonizing Specialized Experts into a Generalist LLM. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 6816–6834, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: BTS: Harmonizing Specialized Experts into a Generalist LLM (Zhang et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.347.pdf
Checklist:: 2025.emnlp-main.347.checklist.pdf

PDF Cite Search Checklist Fix data