Wenqi Jia
2024
MoE-I2: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition
Cheng Yang
|
Yang Sui
|
Jinqi Xiao
|
Lingyi Huang
|
Yu Gong
|
Yuanlin Duan
|
Wenqi Jia
|
Miao Yin
|
Yu Cheng
|
Bo Yuan
Findings of the Association for Computational Linguistics: EMNLP 2024
The emergence of Mixture of Experts (MoE) LLMs has significantly advanced the development of language models. Compared to traditional LLMs, MoE LLMs outperform traditional LLMs by achieving higher performance with considerably fewer activated parameters. Despite this efficiency, their enormous parameter size still leads to high deployment costs. In this paper, we introduce a two-stage compression method tailored for MoE to reduce the model size and decrease the computational cost. First, in the inter-expert pruning stage, we analyze the importance of each layer and propose the Layer-wise Genetic Search and Block-wise KT-Reception Field with the non-uniform pruning ratio to prune the individual expert. Second, in the intra-expert decomposition stage, we apply the low-rank decomposition to further compress the parameters within the remaining experts. Extensive experiments on Qwen1.5-MoE-A2.7B, Deepseek-V2-Lite, and Mixtral-8×7B, demonstrate that our proposed methods can both reduce the model size and enhance inference efficiency while maintaining performance in various zero-shot tasks.
2023
Werewolf Among Us: Multimodal Resources for Modeling Persuasion Behaviors in Social Deduction Games
Bolin Lai
|
Hongxin Zhang
|
Miao Liu
|
Aryan Pariani
|
Fiona Ryan
|
Wenqi Jia
|
Shirley Anugrah Hayati
|
James Rehg
|
Diyi Yang
Findings of the Association for Computational Linguistics: ACL 2023
Persuasion modeling is a key building block for conversational agents. Existing works in this direction are limited to analyzing textual dialogue corpus. We argue that visual signals also play an important role in understanding human persuasive behaviors. In this paper, we introduce the first multimodal dataset for modeling persuasion behaviors. Our dataset includes 199 dialogue transcriptions and videos captured in a multi-player social deduction game setting, 26,647 utterance level annotations of persuasion strategy, and game level annotations of deduction game outcomes. We provide extensive experiments to show how dialogue context and visual signals benefit persuasion strategy prediction. We also explore the generalization ability of language models for persuasion modeling and the role of persuasion strategies in predicting social deduction game outcomes. Our dataset can be found at https://persuasion-deductiongame. socialai-data.org. The codes and models are available at https://github.com/SALT-NLP/PersuationGames.
Search
Co-authors
- Bolin Lai 1
- Hongxin Zhang 1
- Miao Liu 1
- Aryan Pariani 1
- Fiona Ryan 1
- show all...