Mixture of Diverse Size Experts

Manxi Sun; Wei Liu; Jian Luan; Pengzhi Gao; Bin Wang

doi:10.18653/v1/2024.emnlp-industry.118

Mixture of Diverse Size Experts

Manxi Sun, Wei Liu, Jian Luan, Pengzhi Gao, Bin Wang

Abstract

The Sparsely-Activated Mixture-of-Experts (MoE) architecture has gained popularity for scaling large language models (LLMs) due to the sub-linearly increasing computational costs. Despite its success, most of the current structure designs face the challenge that the experts share the same size such that tokens have no chance to choose the experts with the most appropriate size to generate the next token. To migrate this defect, we propose Mixture of Diverse Size Experts (MoDSE), a new MoE architecture with designed layers where experts have different sizes. Analysis on difficult token generation tasks shows that experts with different sizes give better predictions, and the routing path of the experts tends to be stable after a period of training. The diversity of experts’ size will lead to load unbalancing. To tackle this limitation, we introduce an expert-pair allocation strategy to distribute the workload evenly across the GPUs. Comprehensive evaluations across multiple benchmarks demonstrate the effectiveness of MoDSE, surpassing existing MoEs by adaptively assigning the parameter budget to experts while maintaining the same total parameter size and number of experts.

Anthology ID:: 2024.emnlp-industry.118
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2024
Address:: Miami, Florida, US
Editors:: Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1608–1621
Language:
URL:: https://aclanthology.org/2024.emnlp-industry.118/
DOI:: 10.18653/v1/2024.emnlp-industry.118
Bibkey:
Cite (ACL):: Manxi Sun, Wei Liu, Jian Luan, Pengzhi Gao, and Bin Wang. 2024. Mixture of Diverse Size Experts. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 1608–1621, Miami, Florida, US. Association for Computational Linguistics.
Cite (Informal):: Mixture of Diverse Size Experts (Sun et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-industry.118.pdf
Poster:: 2024.emnlp-industry.118.poster.pdf
Presentation:: 2024.emnlp-industry.118.presentation.pdf
Video:: 2024.emnlp-industry.118.video.mov

PDF Cite Search Poster Presentation Video Fix data