DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Damai Dai author Chengqi Deng author Chenggang Zhao author R.x. Xu author Huazuo Gao author Deli Chen author Jiashi Li author Wangding Zeng author Xingkai Yu author Y Wu author Zhenda Xie author Y.k. Li author Panpan Huang author Fuli Luo author Chong Ruan author Zhifang Sui author Wenfeng Liang author 2024-08 text Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication dai-etal-2024-deepseekmoe 10.18653/v1/2024.acl-long.70 https://aclanthology.org/2024.acl-long.70/ 2024-08 1280 1297