Yufei Ma

2024

The growing demand for larger-scale models in the development of Large Language Models (LLMs) poses challenges for efficient training within limited computational resources. Traditional fine-tuning methods often exhibit instability in multi-task learning and rely heavily on extensive training resources. Here, we propose MoDULA (Mixture of Domain-Specific and Universal LoRA), a novel Parameter Efficient Fine-Tuning (PEFT) Mixture-of-Expert (MoE) paradigm for improved fine-tuning and parameter efficiency in multi-task learning. The paradigm effectively improves the multi-task capability of the model by training universal experts, domain-specific experts, and routers separately. MoDULA-Res is a new method within the MoDULA paradigm, which maintains the model’s general capability by connecting universal and task-specific experts through residual connections. The experimental results demonstrate that the overall performance of the MoDULA-Flan and MoDULA-Res methods surpasses that of existing fine-tuning methods on various LLMs. Notably, MoDULA-Res achieves more significant performance improvements in multiple tasks while reducing training costs by over 80% without losing general capability. Moreover, MoDULA displays flexible pluggability, allowing for the efficient addition of new tasks without retraining existing experts from scratch. This progressive training paradigm circumvents data balancing issues, enhancing training efficiency and model stability. Overall, MoDULA provides a scalable, cost-effective solution for fine-tuning LLMs with enhanced parameter efficiency and generalization capability.

The effectiveness of Large Language Models (LLMs) relies on their capacity to understand instructions and generate human-like responses. However, aligning LLMs with complex human preferences remains a significant challenge due to the potential misinterpretation of user prompts. Current methods for aligning LLM behaviors fall into two categories: output optimization (such as RLHF, RLAIF, and DPO) and input optimization (like OPRO and BPO). While both approaches aim to guide LLMs towards generating responses that align with desired objectives, the labor-intensive and intentions-inconsistent data annotation, as well as the strict and tedious training supervision, make them struggle to yield optimal results across all models. To address these shortcomings, we introduce a novel self-renewal approach called Prompt Optimization with Implicit Reasoning (POIR). It consists of two key components: 1) a model-specific and self-recirculating data collection method that leverages self-evaluation to enhance prompts in accordance with the model’s intrinsic logits, and 2) a prompt rewrite schema that injects implicit reasoning for direct preference learning. Through self-renewal optimization, POIR refines LLM outputs to better align with human preferences across various LLMs and tasks, without relying on supervised fine-tuning. Extensive experiments on a range of LLMs and tasks demonstrate POIR’s superior performance. We believe this advancement offers a novel paradigm for developing LLMs that are more attuned to user intentions.

Co-authors

Venues

emnlp1
findings1

Fix data