2024
pdf
bib
abs
Cutting Off the Head Ends the Conflict: A Mechanism for Interpreting and Mitigating Knowledge Conflicts in Language Models
Zhuoran Jin
|
Pengfei Cao
|
Hongbang Yuan
|
Yubo Chen
|
Jiexin Xu
|
Huaijun Li
|
Xiaojian Jiang
|
Kang Liu
|
Jun Zhao
Findings of the Association for Computational Linguistics: ACL 2024
Recently, retrieval augmentation and tool augmentation have demonstrated a remarkable capability to expand the internal memory boundaries of language models (LMs) by providing external context. However, internal memory and external context inevitably clash, leading to knowledge conflicts within LMs. In this paper, we aim to interpret the mechanism of knowledge conflicts through the lens of information flow, and then mitigate conflicts by precise interventions at the pivotal point. We find there are some attention heads with opposite effects in the later layers, where memory heads can recall knowledge from internal memory, and context heads can retrieve knowledge from external context. Moreover, we reveal that the pivotal point at which knowledge conflicts emerge in LMs is the integration of inconsistent information flows by memory heads and context heads. Inspired by the insights, we propose a novel method called Pruning Head via PatH PatcHing (PH3), which can efficiently mitigate knowledge conflicts by pruning conflicting attention heads without updating model parameters. PH3 can flexibly control eight LMs to use internal memory (↑ 44.0%) or external context (↑ 38.5%). Moreover, PH3 can also improve the performance of LMs on open-domain QA tasks. We also conduct extensive experiments to demonstrate the cross-model, cross-relation, and cross-format generalization of our method. Our code is publicly available at https://github.com/jinzhuoran/MConflict/.
pdf
bib
abs
LINKED: Eliciting, Filtering and Integrating Knowledge in Large Language Model for Commonsense Reasoning
Jiachun Li
|
Pengfei Cao
|
Chenhao Wang
|
Zhuoran Jin
|
Yubo Chen
|
Kang Liu
|
Xiaojian Jiang
|
Jiexin Xu
|
Jun Zhao
Findings of the Association for Computational Linguistics: EMNLP 2024
Large language models (LLMs) sometimes demonstrate poor performance on knowledge-intensive tasks, commonsense reasoning is one of them. Researchers typically address these issues by retrieving related knowledge from knowledge graphs or employing self-enhancement methods to elicit knowledge in LLMs. However, noisy knowledge and invalid reasoning issues hamper their ability to answer questions accurately. To this end, we propose a novel method named eliciting, filtering and integrating knowledge in large language model (LINKED). In it, we design a reward model to filter out the noisy knowledge and take the marginal consistent reasoning module to reduce invalid reasoning. With our comprehensive experiments on two complex commonsense reasoning benchmarks, our method outperforms SOTA baselines (up to 9.0% improvement of accuracy). Besides, to measure the positive and negative impact of the injected knowledge, we propose a new metric called effectiveness-preservation score for the knowledge enhancement works. Finally, through extensive experiments, we conduct an in-depth analysis and find many meaningful conclusions about LLMs in commonsense reasoning tasks.
pdf
bib
abs
Leros: Learning Explicit Reasoning on Synthesized Data for Commonsense Question Answering
Chenhao Wang
|
Pengfei Cao
|
Jiachun Li
|
Yubo Chen
|
Kang Liu
|
Xiaojian Jiang
|
Jiexin Xu
|
Li Qiuxia
|
Jun Zhao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Recent work shows large language models can be prompted to generate useful rationales for commonsense question answering (CQA), which can improve the performance of both themselves and other models. However, the cost of deployment and further tuning is relatively expensive for the large models. Some work explores to distill the the rationale-generation ability to convenient small-sized models, yet it typically requires human-authored QA instances during the distillation. In this paper, we propose a novel framework that leverages both knowledge graphs and large language models to synthesize rationale-augmented CQA data. Based on it, we train Leros, a model that can generate helpful rationales to assist generic QA models to accomplish unseen CQA tasks. Empirical results demonstrate Leros can substantially enhance the performance of QA models on five unseen CQA benchmarks, providing better gains than both same-sized counterpart models trained with downstream data and 10x larger language models. Our work reveals a novel way to integrate knowledge from both knowledge graphs and large language models into smaller models. The codes and synthesized resources are publicly available at https://github.com/wchrepo/leros.
pdf
bib
abs
Tug-of-War between Knowledge: Exploring and Resolving Knowledge Conflicts in Retrieval-Augmented Language Models
Zhuoran Jin
|
Pengfei Cao
|
Yubo Chen
|
Kang Liu
|
Xiaojian Jiang
|
Jiexin Xu
|
Li Qiuxia
|
Jun Zhao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Retrieval-augmented language models (RALMs) have demonstrated significant potential in refining and expanding their internal memory by retrieving evidence from external sources. However, RALMs will inevitably encounter knowledge conflicts when integrating their internal memory with external sources. Knowledge conflicts can ensnare RALMs in a tug-of-war between knowledge, limiting their practical applicability. In this paper, we focus on exploring and resolving knowledge conflicts in RALMs. First, we present an evaluation framework for assessing knowledge conflicts across various dimensions. Then, we investigate the behavior and preference of RALMs from the following two perspectives: (1) Conflicts between internal memory and external sources: We find that stronger RALMs emerge with the Dunning-Kruger effect, persistently favoring their faulty internal memory even when correct evidence is provided. Besides, RALMs exhibit an availability bias towards common knowledge; (2) Conflicts between truthful, irrelevant and misleading evidence: We reveal that RALMs follow the principle of majority rule, leaning towards placing trust in evidence that appears more frequently. Moreover, we find that RALMs exhibit confirmation bias, and are more willing to choose evidence that is consistent with their internal memory. To solve the challenge of knowledge conflicts, we propose a method called Conflict-Disentangle Contrastive Decoding (CD2) to better calibrate the model’s confidence. Experimental results demonstrate that our CD2 can effectively resolve knowledge conflicts in RALMs.
2023
pdf
bib
abs
Event Ontology Completion with Hierarchical Structure Evolution Networks
Pengfei Cao
|
Yupu Hao
|
Yubo Chen
|
Kang Liu
|
Jiexin Xu
|
Huaijun Li
|
Xiaojian Jiang
|
Jun Zhao
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Traditional event detection methods require predefined event schemas. However, manually defining event schemas is expensive and the coverage of schemas is limited. To this end, some works study the event type induction (ETI) task, which discovers new event types via clustering. However, the setting of ETI suffers from two limitations: event types are not linked into the existing hierarchy and have no semantic names. In this paper, we propose a new research task named Event Ontology Completion (EOC), which aims to simultaneously achieve event clustering, hierarchy expansion and type naming. Furthermore, we develop a Hierarchical Structure Evolution Network (HalTon) for this new task. Specifically, we first devise a Neighborhood Contrastive Clustering module to cluster unlabeled event instances. Then, we propose a Hierarchy-Aware Linking module to incorporate the hierarchical information for event expansion. Finally, we generate meaningful names for new types via an In-Context Learning-based Naming module. Extensive experiments indicate that our method achieves the best performance, outperforming the baselines by 8.23%, 8.79% and 8.10% of ARI score on three datasets.
pdf
bib
abs
Complex Event Schema Induction with Knowledge-Enriched Diffusion Model
Yupu Hao
|
Pengfei Cao
|
Yubo Chen
|
Kang Liu
|
Jiexin Xu
|
Huaijun Li
|
Xiaojian Jiang
|
Jun Zhao
Findings of the Association for Computational Linguistics: EMNLP 2023
The concept of a complex event schema pertains to the graph structure that represents real-world knowledge of events and their multi-dimensional relationships. However, previous studies on event schema induction have been hindered by challenges such as error propagation and data quality issues. To tackle these challenges, we propose a knowledge-enriched discrete diffusion model. Specifically, we distill the abundant event scenario knowledge of Large Language Models (LLMs) through an object-oriented Python style prompt. We incorporate this knowledge into the training data, enhancing its quality. Subsequently, we employ a discrete diffusion process to generate all nodes and links simultaneously in a non-auto-regressive manner to tackle the problem of error propagation. Additionally, we devise an entity relationship prediction module to complete entity relationships between event arguments. Experimental results demonstrate that our approach achieves outstanding performance across a range of evaluation metrics.
2022
pdf
bib
abs
Augmentation, Retrieval, Generation: Event Sequence Prediction with a Three-Stage Sequence-to-Sequence Approach
Bo Zhou
|
Chenhao Wang
|
Yubo Chen
|
Kang Liu
|
Jun Zhao
|
Jiexin Xu
|
Xiaojian Jiang
|
Qiuxia Li
Proceedings of the 29th International Conference on Computational Linguistics
Being able to infer possible events related to a specific target is critical to natural language processing. One challenging task in this line is event sequence prediction, which aims at predicting a sequence of events given a goal. Currently existing approach models this task as a statistical induction problem, to predict a sequence of events by exploring the similarity between the given goal and the known sequences of events. However, this statistical based approach is complex and predicts a limited variety of events. At the same time this approach ignores the rich knowledge of external events that is important for predicting event sequences. In this paper, in order to predict more diverse events, we first reformulate the event sequence prediction problem as a sequence generation problem. Then to leverage external event knowledge, we propose a three-stage model including augmentation, retrieval and generation. Experimental results on the event sequence prediction dataset show that our model outperforms existing methods, demonstrating the effectiveness of the proposed model.
pdf
bib
abs
Generating Temporally-ordered Event Sequences via Event Optimal Transport
Bo Zhou
|
Yubo Chen
|
Kang Liu
|
Jun Zhao
|
Jiexin Xu
|
Xiaojian Jiang
|
Qiuxia Li
Proceedings of the 29th International Conference on Computational Linguistics
Generating temporally-ordered event sequences in texts is important to natural language processing. Two emerging tasks in this direction are temporal event ordering (rearranging the set of events to correct order) and event infilling (generating an event at a specified position). To tackle the two related tasks, the existing method adopts a vanilla sequence-to-sequence model via maximum likelihood estimation (MLE). However, applying this approach to these tasks will cause two issues. One issue is that the MLE loss emphasizes strict local alignment and ignores the global semantics of the event. The other issue is that the model adopts a word-level objective to model events in texts, failing to evaluate the predicted results of the model from the perspective of event sequence. To alleviate these issues, we present a novel model to tackle the generation of temporally-ordered event sequences via Event Optimal Transport (EOT). First, we treat the events in the sequence as modeling units and explicitly extract the semantics of the events. Second, to provide event sequence-level evaluation of the predicted results of the model, we directly match events in sequences. Extensive experimental results show that our approach outperforms previous models on all evaluation datasets. In particular, the accuracy is improved by 7.7%, and the Macro F1 is improved by 7.2% on one of the datasets.