Wang Zihan

2024

The growing demand for larger-scale models in the development of Large Language Models (LLMs) poses challenges for efficient training within limited computational resources. Traditional fine-tuning methods often exhibit instability in multi-task learning and rely heavily on extensive training resources. Here, we propose MoDULA (Mixture of Domain-Specific and Universal LoRA), a novel Parameter Efficient Fine-Tuning (PEFT) Mixture-of-Expert (MoE) paradigm for improved fine-tuning and parameter efficiency in multi-task learning. The paradigm effectively improves the multi-task capability of the model by training universal experts, domain-specific experts, and routers separately. MoDULA-Res is a new method within the MoDULA paradigm, which maintains the model’s general capability by connecting universal and task-specific experts through residual connections. The experimental results demonstrate that the overall performance of the MoDULA-Flan and MoDULA-Res methods surpasses that of existing fine-tuning methods on various LLMs. Notably, MoDULA-Res achieves more significant performance improvements in multiple tasks while reducing training costs by over 80% without losing general capability. Moreover, MoDULA displays flexible pluggability, allowing for the efficient addition of new tasks without retraining existing experts from scratch. This progressive training paradigm circumvents data balancing issues, enhancing training efficiency and model stability. Overall, MoDULA provides a scalable, cost-effective solution for fine-tuning LLMs with enhanced parameter efficiency and generalization capability.

2023

pdf bib abs
Learning on Structured Documents for Conditional Question Answering
Wang Zihan | Qian Hongjin | Dou Zhicheng
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“Conditional question answering (CQA) is an important task in natural language processing that involves answering questions that depend on specific conditions. CQA is crucial for domainsthat require the provision of personalized advice or making context-dependent analyses, such aslegal consulting and medical diagnosis. However, existing CQA models struggle with generatingmultiple conditional answers due to two main challenges: (1) the lack of supervised training datawith diverse conditions and corresponding answers, and (2) the difficulty to output in a complexformat that involves multiple conditions and answers. To address the challenge of limited super-vision, we propose LSD (Learning on Structured Documents), a self-supervised learning methodon structured documents for CQA. LSD involves a conditional problem generation method anda contrastive learning objective. The model is trained with LSD on massive unlabeled structureddocuments and is fine-tuned on labeled CQA dataset afterwards. To overcome the limitation ofoutputting answers with complex formats in CQA, we propose a pipeline that enables the gen-eration of multiple answers and conditions. Experimental results on the ConditionalQA datasetdemonstrate that LSD outperforms previous CQA models in terms of accuracy both in providinganswers and conditions.”

2020

Legal Judgement Prediction has attracted more and more attention in recent years. One of the challenges is how to design a model with better interpretable prediction results. Previous studies have proposed different interpretable models based on the generation of court views and the extraction of charge keywords. Different from previous work, we propose a multi-task legal judgement prediction model which combines a subtask of the seriousness of charges. By introducing this subtask, our model can capture the attention weights of different terms of penalty corresponding to the charges and give more attention to the correct terms of penalty in the fact descriptions. Meanwhile, our model also incorporates the position of defendant making it capable of giving attention to the contextual information of the defendant. We carry several experiments on the public CAIL2018 dataset. Experimental results show that our model achieves better or comparable performance on three subtasks compared with the baseline models. Moreover, we also analyze the interpretable contribution of our model.