MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning

Yufei Ma; Zihan Liang; Huangyu Dai; Ben Chen; Dehong Gao; Zhuoran Ran; Wang Zihan; Linbo Jin; Wen Jiang; Guannan Zhang; Xiaoyan Cai; Libin Yang

doi:10.18653/v1/2024.emnlp-main.161

MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning

Yufei Ma, Zihan Liang, Huangyu Dai, Ben Chen, Dehong Gao, Zhuoran Ran, Wang Zihan, Linbo Jin, Wen Jiang, Guannan Zhang, Xiaoyan Cai, Libin Yang

Abstract

The growing demand for larger-scale models in the development of Large Language Models (LLMs) poses challenges for efficient training within limited computational resources. Traditional fine-tuning methods often exhibit instability in multi-task learning and rely heavily on extensive training resources. Here, we propose MoDULA (Mixture of Domain-Specific and Universal LoRA), a novel Parameter Efficient Fine-Tuning (PEFT) Mixture-of-Expert (MoE) paradigm for improved fine-tuning and parameter efficiency in multi-task learning. The paradigm effectively improves the multi-task capability of the model by training universal experts, domain-specific experts, and routers separately. MoDULA-Res is a new method within the MoDULA paradigm, which maintains the model’s general capability by connecting universal and task-specific experts through residual connections. The experimental results demonstrate that the overall performance of the MoDULA-Flan and MoDULA-Res methods surpasses that of existing fine-tuning methods on various LLMs. Notably, MoDULA-Res achieves more significant performance improvements in multiple tasks while reducing training costs by over 80% without losing general capability. Moreover, MoDULA displays flexible pluggability, allowing for the efficient addition of new tasks without retraining existing experts from scratch. This progressive training paradigm circumvents data balancing issues, enhancing training efficiency and model stability. Overall, MoDULA provides a scalable, cost-effective solution for fine-tuning LLMs with enhanced parameter efficiency and generalization capability.

Anthology ID:: 2024.emnlp-main.161
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2758–2770
Language:
URL:: https://aclanthology.org/2024.emnlp-main.161/
DOI:: 10.18653/v1/2024.emnlp-main.161
Bibkey:
Cite (ACL):: Yufei Ma, Zihan Liang, Huangyu Dai, Ben Chen, Dehong Gao, Zhuoran Ran, Wang Zihan, Linbo Jin, Wen Jiang, Guannan Zhang, Xiaoyan Cai, and Libin Yang. 2024. MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 2758–2770, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning (Ma et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-main.161.pdf

PDF Cite Search Fix data