MoE-I²: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition

MoE-I²: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition Cheng Yang author Yang Sui author Jinqi Xiao author Lingyi Huang author Yu Gong author Yuanlin Duan author Wenqi Jia author Miao Yin author Yu Cheng author Bo Yuan author 2024-11 text Findings of the Association for Computational Linguistics: EMNLP 2024 Yaser Al-Onaizan editor Mohit Bansal editor Yun-Nung Chen editor Association for Computational Linguistics Miami, Florida, USA conference publication yang-etal-2024-moe 10.18653/v1/2024.findings-emnlp.612 https://aclanthology.org/2024.findings-emnlp.612/ 2024-11 10456 10466