SimSMoE: Toward Efficient Training Mixture of Experts via Solving Representational Collapse

Giang Do; Hung Le; Truyen Tran

doi:10.18653/v1/2025.findings-naacl.107

SimSMoE: Toward Efficient Training Mixture of Experts via Solving Representational Collapse

Abstract

Sparse mixture of experts (SMoE) have emerged as an effective approach for scaling large language models while keeping a constant computational cost. Regardless of several notable successes of SMoE, effective training such architecture remains elusive due to the representation collapse problem, which in turn harms model performance and causes parameter redundancy. In this work, we present Similarity-based Sparse Mixture of Experts (SimSMoE), a novel similarity of neural network algorithm, that guarantees a solution to address the representation collapse issue between experts given a fixed FLOPs budget. We conduct extensive empirical evaluations on three large language models for both Pre-training and Fine-tuning tasks to illustrate the efficacy, robustness, and scalability of our method. The results demonstrate that SimSMoE significantly enhances existing routing policy and outperforms other SMoE routing methods in performance for the tasks. Our implementation is publicly available at https://github.com/giangdip2410/SimSMoE.

Anthology ID:: 2025.findings-naacl.107
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2012–2025
Language:
URL:: https://aclanthology.org/2025.findings-naacl.107/
DOI:: 10.18653/v1/2025.findings-naacl.107
Bibkey:
Cite (ACL):: Giang Do, Hung Le, and Truyen Tran. 2025. SimSMoE: Toward Efficient Training Mixture of Experts via Solving Representational Collapse. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 2012–2025, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: SimSMoE: Toward Efficient Training Mixture of Experts via Solving Representational Collapse (Do et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-naacl.107.pdf

PDF Cite Search Fix data