2024
pdf
bib
abs
MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning
Yufei Ma
|
Zihan Liang
|
Huangyu Dai
|
Ben Chen
|
Dehong Gao
|
Zhuoran Ran
|
Wang Zihan
|
Linbo Jin
|
Wen Jiang
|
Guannan Zhang
|
Xiaoyan Cai
|
Libin Yang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
The growing demand for larger-scale models in the development of Large Language Models (LLMs) poses challenges for efficient training within limited computational resources. Traditional fine-tuning methods often exhibit instability in multi-task learning and rely heavily on extensive training resources. Here, we propose MoDULA (Mixture of Domain-Specific and Universal LoRA), a novel Parameter Efficient Fine-Tuning (PEFT) Mixture-of-Expert (MoE) paradigm for improved fine-tuning and parameter efficiency in multi-task learning. The paradigm effectively improves the multi-task capability of the model by training universal experts, domain-specific experts, and routers separately. MoDULA-Res is a new method within the MoDULA paradigm, which maintains the model’s general capability by connecting universal and task-specific experts through residual connections. The experimental results demonstrate that the overall performance of the MoDULA-Flan and MoDULA-Res methods surpasses that of existing fine-tuning methods on various LLMs. Notably, MoDULA-Res achieves more significant performance improvements in multiple tasks while reducing training costs by over 80% without losing general capability. Moreover, MoDULA displays flexible pluggability, allowing for the efficient addition of new tasks without retraining existing experts from scratch. This progressive training paradigm circumvents data balancing issues, enhancing training efficiency and model stability. Overall, MoDULA provides a scalable, cost-effective solution for fine-tuning LLMs with enhanced parameter efficiency and generalization capability.
pdf
bib
abs
Self-Renewal Prompt Optimizing with Implicit Reasoning
Zihan Liang
|
Ben Chen
|
Zhuoran Ran
|
Zihan Wang
|
Huangyu Dai
|
Yufei Ma
|
Dehong Gao
|
Xiaoyan Cai
|
Libin Yang
Findings of the Association for Computational Linguistics: EMNLP 2024
The effectiveness of Large Language Models (LLMs) relies on their capacity to understand instructions and generate human-like responses. However, aligning LLMs with complex human preferences remains a significant challenge due to the potential misinterpretation of user prompts. Current methods for aligning LLM behaviors fall into two categories: output optimization (such as RLHF, RLAIF, and DPO) and input optimization (like OPRO and BPO). While both approaches aim to guide LLMs towards generating responses that align with desired objectives, the labor-intensive and intentions-inconsistent data annotation, as well as the strict and tedious training supervision, make them struggle to yield optimal results across all models. To address these shortcomings, we introduce a novel self-renewal approach called Prompt Optimization with Implicit Reasoning (POIR). It consists of two key components: 1) a model-specific and self-recirculating data collection method that leverages self-evaluation to enhance prompts in accordance with the model’s intrinsic logits, and 2) a prompt rewrite schema that injects implicit reasoning for direct preference learning. Through self-renewal optimization, POIR refines LLM outputs to better align with human preferences across various LLMs and tasks, without relying on supervised fine-tuning. Extensive experiments on a range of LLMs and tasks demonstrate POIR’s superior performance. We believe this advancement offers a novel paradigm for developing LLMs that are more attuned to user intentions.
2022
pdf
bib
abs
Dependency Position Encoding for Relation Extraction
Qiushi Guo
|
Xin Wang
|
Dehong Gao
Findings of the Association for Computational Linguistics: NAACL 2022
Leveraging the dependency tree of the input sentence is able to improve the model performance for relation extraction. A challenging issue is how to remove confusions from the tree. Efforts have been made to utilize the dependency connections between words to selectively emphasize target-relevant information. However, these approaches are limited in focusing on exploiting dependency types. In this paper, we propose dependency position encoding (DPE), an efficient way of incorporating both dependency connections and dependency types into the self-attention mechanism to distinguish the importance of different word dependencies for the task. In contrast to previous studies that process input sentence and dependency information in separate streams, DPE can be seamlessly incorporated into the Transformer and makes it possible to use an one-stream scheme to extract relations between entity pairs. Extensive experiments show that models with our DPE significantly outperform the previous methods on SemEval 2010 Task 8, KBP37, and TACRED.
2020
pdf
bib
abs
Deep Hierarchical Classification for Category Prediction in E-commerce System
Dehong Gao
Proceedings of the 3rd Workshop on e-Commerce and NLP
In e-commerce system, category prediction is to automatically predict categories of given texts. Different from traditional classification where there are no relations between classes, category prediction is reckoned as a standard hierarchical classification problem since categories are usually organized as a hierarchical tree. In this paper, we address hierarchical category prediction. We propose a Deep Hierarchical Classification framework, which incorporates the multi-scale hierarchical information in neural networks and introduces a representation sharing strategy according to the category tree. We also define a novel combined loss function to punish hierarchical prediction losses. The evaluation shows that the proposed approach outperforms existing approaches in accuracy.
2015
pdf
bib
Cross-lingual Sentiment Lexicon Learning With Bilingual Word Graph Label Propagation
Dehong Gao
|
Furu Wei
|
Wenjie Li
|
Xiaohua Liu
|
Ming Zhou
Computational Linguistics, Volume 41, Issue 1 - March 2015
2013
pdf
bib
Sequential Summarization: A New Application for Timely Updated Twitter Trending Topics
Dehong Gao
|
Wenjie Li
|
Renxian Zhang
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
2012
pdf
bib
Efficient Feedback-based Feature Learning for Blog Distillation as a Terabyte Challenge
Dehong Gao
|
Wenjie Li
|
Renxian Zhang
Proceedings of COLING 2012: Demonstration Papers
pdf
bib
Beyond Twitter Text: A Preliminary Study on Twitter Hyperlink and its Application
Dehong Gao
|
Wenjie Li
|
Renxian Zhang
Proceedings of COLING 2012: Demonstration Papers
pdf
bib
Towards Scalable Speech Act Recognition in Twitter: Tackling Insufficient Training Data
Renxian Zhang
|
Dehong Gao
|
Wenjie Li
Proceedings of the Workshop on Semantic Analysis in Social Media
2011
pdf
bib
Simultaneous Clustering and Noise Detection for Theme-based Summarization
Xiaoyan Cai
|
Renxian Zhang
|
Dehong Gao
|
Wenjie Li
Proceedings of 5th International Joint Conference on Natural Language Processing