Efficiently Editing Mixture-of-Experts Models with Compressed Experts

Yifei He, Yang Liu, Chen Liang, Hany Hassan Awadalla


Abstract
Mixture-of-Experts (MoE) models have become a key approach for scaling large language models efficiently by activating only a subset of experts during training and inference. Typically, the number of activated experts presents a trade-off: fewer experts reduce computational costs, while more experts improve performance. Recent studies reveal that not all activated experts contribute equally to model performance, with some providing minimal utility, particularly when finetuning pretrained MoE models for specialized downstream tasks. The co-existence of significant and redundant parameters in experts provides us an opportunity to reduce the number of activated experts while maintaining model performance. In this work, we propose the concept of compressed experts, lightweight modules that serve as compact representations of full experts. Our approach preserves the most important experts while replacing other auxiliary activated experts with compressed experts. The reduction of active parameters significantly lowers inference costs while achieving comparable performance. Extensive experiments on models including Phi-MoE and OLMoE demonstrate that compressed experts recover over 90% of full expert performance across various tasks while reducing more than 30% active parameters and saving 20% in inference costs. This approach enables efficient deployment of MoE models in resource-constrained settings and facilitates scaling to larger models with manageable overhead.
Anthology ID:
2025.findings-emnlp.381
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7227–7238
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.381/
DOI:
Bibkey:
Cite (ACL):
Yifei He, Yang Liu, Chen Liang, and Hany Hassan Awadalla. 2025. Efficiently Editing Mixture-of-Experts Models with Compressed Experts. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 7227–7238, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Efficiently Editing Mixture-of-Experts Models with Compressed Experts (He et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.381.pdf
Checklist:
 2025.findings-emnlp.381.checklist.pdf