2024
pdf
bib
abs
XMC-Agent : Dynamic Navigation over Scalable Hierarchical Index for Incremental Extreme Multi-label Classification
Yanjiang Liu
|
Tianyun Zhong
|
Yaojie Lu
|
Hongyu Lin
|
Ben He
|
Shuheng Zhou
|
Huijia Zhu
|
Weiqiang Wang
|
Zhongyi Liu
|
Xianpei Han
|
Le Sun
Findings of the Association for Computational Linguistics: ACL 2024
The eXtreme Multi-label Classification (XMC) aims at accurately assigning large-scale labels to instances, and is challenging for learning, managing, and predicting over the large-scale and rapidly growing set of labels. Traditional XMC methods, like one-vs-all and tree-based methods struggle with the growing set of labels due to their static label assumptions, and embedding-based methods struggle with the complex mapping relationships due to their late-interaction paradigm. In this paper, we propose a large language model (LLM) powered agent framework for extreme multi-label classification – XMC-Agent, which can effectively learn, manage and predict the extremely large and dynamically increasing set of labels. Specifically, XMC-Agent models the extreme multi-label classification task as a dynamic navigation problem, employing a scalable hierarchical label index to effectively manage the unified label space. Additionally, we propose two algorithms to enhance the dynamic navigation capabilities of XMC-Agent: a self-construction algorithm for building the scalable hierarchical index, and an iterative feedback learning algorithm for adjusting the agent to specific tasks. Experiments show that XMC-Agentachieves the state-of-the-art performance on three standard datasets.
pdf
bib
abs
Debiasing In-Context Learning by Instructing LLMs How to Follow Demonstrations
Lvxue Li
|
Jiaqi Chen
|
Xinyu Lu
|
Yaojie Lu
|
Hongyu Lin
|
Shuheng Zhou
|
Huijia Zhu
|
Weiqiang Wang
|
Zhongyi Liu
|
Xianpei Han
|
Le Sun
Findings of the Association for Computational Linguistics: ACL 2024
In-context learning(ICL) has gained considerable attention due to its data efficiency and task adaptability. Unfortunately, ICL suffers from the demonstration bias, i.e., its performance and robustness are severely affected by the selection and ordering of demonstrations. In this paper, we identify that such demonstration bias may primarily stem from the semantic ambiguity induced by demonstrations, i.e., a demonstration may indicate multiple input-to-label mappings and its mapping can be interpreted differently in different contexts by LLMs. Such semantic ambiguity disrupts task comprehension during ICL and results in performance fluctuations. To resolve the semantic ambiguity problem, this paper further proposes two de-biasing strategies to mitigate demonstration bias in in-context learning. Experiments on six datasets show that our methods can effectively alleviate demonstration bias and significantly improve task performance.
pdf
bib
abs
Chain-of-Rewrite: Aligning Question and Documents for Open-Domain Question Answering
Chunlei Xin
|
Yaojie Lu
|
Hongyu Lin
|
Shuheng Zhou
|
Huijia Zhu
|
Weiqiang Wang
|
Zhongyi Liu
|
Xianpei Han
|
Le Sun
Findings of the Association for Computational Linguistics: EMNLP 2024
Despite the advancements made with the retrieve-then-read pipeline on open-domain question answering task, current methods still face challenges stemming from term mismatch and limited interaction between information retrieval systems and large language models. To mitigate these issues, we propose the Chain-of-Rewrite method, which leverages the guidance and feedback gained from the analysis to provide faithful and consistent extensions for effective question answering. Through a two-step rewriting process comprising Semantic Analysis and Semantic Augmentation, the Chain-of-Rewrite method effectively bridges the gap between the user question and relevant documents. By incorporating feedback from the rewriting process, our method can self-correct the retrieval and reading process to further improve the performance. Experiments on four open-domain question answering datasets demonstrate the effectiveness of our system under zero-shot settings.
pdf
bib
abs
Mirror-Consistency: Harnessing Inconsistency in Majority Voting
Siyuan Huang
|
Zhiyuan Ma
|
Jintao Du
|
Changhua Meng
|
Weiqiang Wang
|
Zhouhan Lin
Findings of the Association for Computational Linguistics: EMNLP 2024
Self-Consistency, a widely-used decoding strategy, significantly boosts the reasoning capabilities of Large Language Models (LLMs). However, it depends on the plurality voting rule, which focuses on the most frequent answer while overlooking all other minority responses. These inconsistent minority views often illuminate areas of uncertainty within the model’s generation process. To address this limitation, we present Mirror-Consistency, an enhancement of the standard Self-Consistency approach. Our method incorporates a ‘reflective mirror’ into the self-ensemble decoding process and enables LLMs to critically examine inconsistencies among multiple generations. Additionally, just as humans use the mirror to better understand themselves, we propose using Mirror-Consistency to enhance the sample-based confidence calibration methods, which helps to mitigate issues of overconfidence. Our experimental results demonstrate that Mirror-Consistency yields superior performance in both reasoning accuracy and confidence calibration compared to Self-Consistency.
pdf
bib
abs
Beyond Full Fine-tuning: Harnessing the Power of LoRA for Multi-Task Instruction Tuning
Chunlei Xin
|
Yaojie Lu
|
Hongyu Lin
|
Shuheng Zhou
|
Huijia Zhu
|
Weiqiang Wang
|
Zhongyi Liu
|
Xianpei Han
|
Le Sun
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Low-Rank Adaptation (LoRA) is a widespread parameter-efficient fine-tuning algorithm for large-scale language models. It has been commonly accepted that LoRA mostly achieves promising results in single-task, low-resource settings, and struggles to handle multi-task instruction tuning scenarios. In this paper, we conduct a systematic study of LoRA on diverse tasks and rich resources with different learning capacities, examining its performance on seen tasks during training and its cross-task generalization on unseen tasks. Our findings challenge the prevalent assumption that the limited learning capacity will inevitably result in performance decline. In fact, our study reveals that when configured with an appropriate rank, LoRA can achieve remarkable performance in high-resource and multi-task scenarios, even comparable to that achieved through full fine-tuning. It turns out that the constrained learning capacity encourages LoRA to prioritize conforming to instruction requirements rather than memorizing specialized features of particular tasks or instances. This study reveals the underlying connection between learning capacity and generalization capabilities for robust parameter-efficient fine-tuning, highlighting a promising direction for the broader application of LoRA across various tasks and settings.
pdf
bib
abs
Enhancing Distantly Supervised Named Entity Recognition with Strong Label Guided Lottery Training
Zhiyuan Ma
|
Jintao Du
|
Changhua Meng
|
Weiqiang Wang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
In low-resource Named Entity Recognition (NER) scenarios, only a limited quantity of strongly labeled data is available, while a vast amount of weakly labeled data can be easily acquired through distant supervision. However, weakly labeled data may fail to improve the model performance or even harm it due to the inevitable noise. While training on noisy data, only certain parameters are essential for model learning, termed safe parameters, whereas the other parameters tend to fit noise. In this paper, we propose a noise-robust learning framework where safe parameters can be identified with guidance from the small set of strongly labeled data, and non-safe parameters are suppressed during training on weakly labeled data for better generalization. Our method can effectively mitigate the impact of noise in weakly labeled data, and it can be easily integrated with data level noise-robust learning methods for NER. We conduct extensive experiments on multiple datasets and the results show that our approach outperforms the state-of-the-art methods.
pdf
bib
abs
PASUM: A Pre-training Architecture for Social Media User Modeling Based on Text Graph
Kun Wu
|
Xinyi Mou
|
Lanqing Xue
|
Zhenzhe Ying
|
Weiqiang Wang
|
Qi Zhang
|
Xuanjing Huang
|
Zhongyu Wei
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Modeling social media users is the core of social governance in the digital society. Existing works have incorporated different digital traces to better learn the representations of social media users, including text information encoded by pre-trained language models and social network information encoded by graph models. However, limited by overloaded text information and hard-to-collect social network information, they cannot utilize global text information and cannot be generalized without social relationships. In this paper, we propose a Pre-training Architecture for Social Media User Modeling based on Text Graph(PASUM). We aggregate all microblogs to represent social media users based on the text graph model and learn the mapping from microblogs to user representation. We further design inter-user and intra-user contrastive learning tasks to inject general structural information into the mapping. In different scenarios, we can represent users based on text, even without social network information. Experimental results on various downstream tasks demonstrate the effectiveness and superiority of our framework.