COMPEL: Compensated Mixture-of-Experts Pruning with Expert-Layer distribution

Seohee Yoon; Yong Suk Choi

COMPEL: Compensated Mixture-of-Experts Pruning with Expert-Layer distribution

Abstract

Mixture-of-Experts (MoE) architectures have emerged as an effective approach for scaling Large Language Models (LLMs) by activating only a subset of experts during inference. Despite their computational efficiency, MoE models incur a substantial memory bottleneck from maintaining all expert parameters during inference. To address this challenge, numerous MoE pruning methods have been proposed. However, most existing methods adopt uniform pruning across layers, which fails to capture layer-wise variations in expert importance and redundancy. In this paper, we propose COmpensated MoE Pruning with Expert-Layer distribution (COMPEL). COMPEL performs layer-adaptive expert pruning by estimating expert importance using Fisher information and deriving layer importance from layer-wise outlier distributions, enabling pruning decisions that capture layer-wise heterogeneity. Furthermore, to mitigate performance degradation resulting from expert pruning, we propose a Fisher information guided expert weight compensation method. Experimental results on the Qwen1.5-MoE-A2.7B achieve near lossless performance at 25% expert pruning and maintains performance within a 4% margin even at 50% pruning. Moreover, COMPEL consistently outperforms existing pruning methods while substantially reducing inference latency and peak GPU memory usage.

Anthology ID:: 2026.findings-acl.1521
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 30408–30423
Language:
URL:: https://aclanthology.org/2026.findings-acl.1521/
DOI:
Bibkey:
Cite (ACL):: Seohee Yoon and Yong Suk Choi. 2026. COMPEL: Compensated Mixture-of-Experts Pruning with Expert-Layer distribution. In Findings of the Association for Computational Linguistics: ACL 2026, pages 30408–30423, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: COMPEL: Compensated Mixture-of-Experts Pruning with Expert-Layer distribution (Yoon & Choi, Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1521.pdf
Checklist:: 2026.findings-acl.1521.checklist.pdf

PDF Cite Search Checklist Fix data