SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory Budget Rui Kong author Yuanchun Li author Qingtian Feng author Weijun Wang author Xiaozhou Ye author Ye Ouyang author Linghe Kong author Yunxin Liu author 2024-08 text Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication kong-etal-2024-swapmoe 10.18653/v1/2024.acl-long.363 https://aclanthology.org/2024.acl-long.363/ 2024-08 6710 6720