Yufei Ma


2024

pdf bib
MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning
Yufei Ma | Zihan Liang | Huangyu Dai | Ben Chen | Dehong Gao | Zhuoran Ran | Wang Zihan | Linbo Jin | Wen Jiang | Guannan Zhang | Xiaoyan Cai | Libin Yang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

The growing demand for larger-scale models in the development of Large Language Models (LLMs) poses challenges for efficient training within limited computational resources. Traditional fine-tuning methods often exhibit instability in multi-task learning and rely heavily on extensive training resources. Here, we propose MoDULA (Mixture of Domain-Specific and Universal LoRA), a novel Parameter Efficient Fine-Tuning (PEFT) Mixture-of-Expert (MoE) paradigm for improved fine-tuning and parameter efficiency in multi-task learning. The paradigm effectively improves the multi-task capability of the model by training universal experts, domain-specific experts, and routers separately. MoDULA-Res is a new method within the MoDULA paradigm, which maintains the model’s general capability by connecting universal and task-specific experts through residual connections. The experimental results demonstrate that the overall performance of the MoDULA-Flan and MoDULA-Res methods surpasses that of existing fine-tuning methods on various LLMs. Notably, MoDULA-Res achieves more significant performance improvements in multiple tasks while reducing training costs by over 80% without losing general capability. Moreover, MoDULA displays flexible pluggability, allowing for the efficient addition of new tasks without retraining existing experts from scratch. This progressive training paradigm circumvents data balancing issues, enhancing training efficiency and model stability. Overall, MoDULA provides a scalable, cost-effective solution for fine-tuning LLMs with enhanced parameter efficiency and generalization capability.

pdf bib
Self-Renewal Prompt Optimizing with Implicit Reasoning
Zihan Liang | Ben Chen | Zhuoran Ran | Zihan Wang | Huangyu Dai | Yufei Ma | Dehong Gao | Xiaoyan Cai | Libin Yang
Findings of the Association for Computational Linguistics: EMNLP 2024

The effectiveness of Large Language Models (LLMs) relies on their capacity to understand instructions and generate human-like responses. However, aligning LLMs with complex human preferences remains a significant challenge due to the potential misinterpretation of user prompts. Current methods for aligning LLM behaviors fall into two categories: output optimization (such as RLHF, RLAIF, and DPO) and input optimization (like OPRO and BPO). While both approaches aim to guide LLMs towards generating responses that align with desired objectives, the labor-intensive and intentions-inconsistent data annotation, as well as the strict and tedious training supervision, make them struggle to yield optimal results across all models. To address these shortcomings, we introduce a novel self-renewal approach called Prompt Optimization with Implicit Reasoning (POIR). It consists of two key components: 1) a model-specific and self-recirculating data collection method that leverages self-evaluation to enhance prompts in accordance with the model’s intrinsic logits, and 2) a prompt rewrite schema that injects implicit reasoning for direct preference learning. Through self-renewal optimization, POIR refines LLM outputs to better align with human preferences across various LLMs and tasks, without relying on supervised fine-tuning. Extensive experiments on a range of LLMs and tasks demonstrate POIR’s superior performance. We believe this advancement offers a novel paradigm for developing LLMs that are more attuned to user intentions.