2024
pdf
bib
abs
Efficient Knowledge Infusion via KG-LLM Alignment
Zhouyu Jiang
|
Ling Zhong
|
Mengshu Sun
|
Jun Xu
|
Rui Sun
|
Hui Cai
|
Shuhan Luo
|
Zhiqiang Zhang
Findings of the Association for Computational Linguistics: ACL 2024
To tackle the problem of domain-specific knowledge scarcity within large language models (LLMs), knowledge graph-retrievalaugmented method has been proven to be an effective and efficient technique for knowledge infusion. However, existing approaches face two primary challenges: knowledge mismatch between public available knowledge graphs and the specific domain of the task at hand, and poor information compliance of LLMs with knowledge graphs. In this paper, we leverage a small set of labeled samples and a large-scale corpus to efficiently construct domain-specific knowledge graphs by an LLM, addressing the issue of knowledge mismatch. Additionally, we propose a three-stage KG-LLM alignment strategy to enhance the LLM’s capability to utilize information from knowledge graphs. We conduct experiments with a limited-sample setting on two biomedical question-answering datasets, and the results demonstrate that our approach outperforms existing baselines.
pdf
bib
abs
OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs
Jintian Zhang
|
Cheng Peng
|
Mengshu Sun
|
Xiang Chen
|
Lei Liang
|
Zhiqiang Zhang
|
Jun Zhou
|
Huajun Chen
|
Ningyu Zhang
Findings of the Association for Computational Linguistics: EMNLP 2024
Despite the recent advancements in Large Language Models (LLMs), which have significantly enhanced the generative capabilities for various NLP tasks, LLMs still face limitations in directly handling retrieval tasks. However, many practical applications demand the seamless integration of both retrieval and generation. This paper introduces a novel and efficient One-pass Generation and retrieval framework (OneGen), designed to improve LLMs’ performance on tasks that require both generation and retrieval. The proposed framework bridges the traditionally separate training approaches for generation and retrieval by incorporating retrieval tokens generated autoregressively. This enables a single LLM to handle both tasks simultaneously in a unified forward pass. We conduct experiments on two distinct types of composite tasks, RAG and Entity Linking, to validate the pluggability, effectiveness, and efficiency of OneGen in training and inference. Furthermore, our results show that integrating generation and retrieval within the same context preserves the generative capabilities of LLMs while improving retrieval performance. To the best of our knowledge, OneGen is the first to enable LLMs to conduct vector retrieval during the generation.
pdf
bib
abs
IEPile: Unearthing Large Scale Schema-Conditioned Information Extraction Corpus
Honghao Gui
|
Lin Yuan
|
Hongbin Ye
|
Ningyu Zhang
|
Mengshu Sun
|
Lei Liang
|
Huajun Chen
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Large Language Models (LLMs) demonstrate remarkable potential across various domains; however, they exhibit a significant performance gap in Information Extraction (IE). Note that high-quality instruction data is the vital key for enhancing the specific capabilities of LLMs, while current IE datasets tend to be small in scale, fragmented, and lack standardized schema. To this end, we introduce IEPile, a comprehensive bilingual (English and Chinese) IE instruction corpus, which contains approximately 0.32B tokens. We construct IEPile by collecting and cleaning 33 existing IE datasets, and introduce schema-based instruction generation to unearth a large-scale corpus. Experimentally, IEPile enhance the performance of LLMs for IE, with notable improvements in zero-shot generalization. We open-source the resource and pre-trained models, hoping to provide valuable support to the NLP community.
pdf
bib
abs
ChatUIE: Exploring Chat-based Unified Information Extraction Using Large Language Models
Jun Xu
|
Mengshu Sun
|
Zhiqiang Zhang
|
Jun Zhou
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Recent advancements in large language models have shown impressive performance in general chat. However, their domain-specific capabilities, particularly in information extraction, have certain limitations. Extracting structured information from natural language that deviates from known schemas or instructions has proven challenging for previous prompt-based methods. This motivated us to explore domain-specific modeling in chat-based language models as a solution for extracting structured information from natural language. In this paper, we present ChatUIE, an innovative unified information extraction framework built upon ChatGLM. Simultaneously, reinforcement learning is employed to improve and align various tasks that involve confusing and limited samples. Furthermore, we integrate generation constraints to address the issue of generating elements that are not present in the input. Our experimental results demonstrate that ChatUIE can significantly improve the performance of information extraction with a slight decrease in chatting ability.
pdf
bib
abs
Continual Few-shot Event Detection via Hierarchical Augmentation Networks
Chenlong Zhang
|
Pengfei Cao
|
Yubo Chen
|
Kang Liu
|
Zhiqiang Zhang
|
Mengshu Sun
|
Jun Zhao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Traditional continual event detection relies on abundant labeled data for training, which is often impractical to obtain in real-world applications. In this paper, we introduce continual few-shot event detection (CFED), a more commonly encountered scenario when a substantial number of labeled samples are not accessible. The CFED task is challenging as it involves memorizing previous event types and learning new event types with few-shot samples. To mitigate these challenges, we propose a memory-based framework: Hierarchical Augmentation Network (HANet). To memorize previous event types with limited memory, we incorporate prototypical augmentation into the memory set. For the issue of learning new event types in few-shot scenarios, we propose a contrastive augmentation module for token representations. Despite comparing with previous state-of-the-art methods, we also conduct comparisons with ChatGPT. Experiment results demonstrate that our method significantly outperforms all of these methods in multiple continual few-shot event detection tasks.
pdf
bib
abs
Zero-Shot Cross-Lingual Document-Level Event Causality Identification with Heterogeneous Graph Contrastive Transfer Learning
Zhitao He
|
Pengfei Cao
|
Zhuoran Jin
|
Yubo Chen
|
Kang Liu
|
Zhiqiang Zhang
|
Mengshu Sun
|
Jun Zhao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Event Causality Identification (ECI) refers to the detection of causal relations between events in texts. However, most existing studies focus on sentence-level ECI with high-resource languages, leaving more challenging document-level ECI (DECI) with low-resource languages under-explored. In this paper, we propose a Heterogeneous Graph Interaction Model with Multi-granularity Contrastive Transfer Learning (GIMC) for zero-shot cross-lingual document-level ECI. Specifically, we introduce a heterogeneous graph interaction network to model the long-distance dependencies between events that are scattered over a document. Then, to improve cross-lingual transferability of causal knowledge learned from the source language, we propose a multi-granularity contrastive transfer learning module to align the causal representations across languages. Extensive experiments show our framework outperforms the previous state-of-the-art model by 9.4% and 8.2% of average F1 score on monolingual and multilingual scenarios respectively. Notably, in the multilingual scenario, our zero-shot framework even exceeds GPT-3.5 with few-shot learning by 24.3% in overall performance.
2023
pdf
bib
abs
LEGO: A Multi-agent Collaborative Framework with Role-playing and Iterative Feedback for Causality Explanation Generation
Zhitao He
|
Pengfei Cao
|
Yubo Chen
|
Kang Liu
|
Ruopeng Li
|
Mengshu Sun
|
Jun Zhao
Findings of the Association for Computational Linguistics: EMNLP 2023
Causality Explanation Generation refers to generate an explanation in natural language given an initial cause-effect pair. It demands rigorous explicit rationales to demonstrate the acquisition of implicit commonsense knowledge, which is unlikely to be easily memorized, making it challenging for large language models since they are often suffering from spurious causal associations when they encounter the content that does not exist in their memory. In this work, we introduce LEGO, a Multi-agent Collaborative Framework with Role-playing and Iterative Feedback for causality explanation generation. Specifically, we treat LLM as character malleable LEGO block and utilize role-playing to assign specific roles to five LLMs. We firstly devise a Fine-grained World Knowledge Integration Module to augment information about tasks for alleviating the phenomenon of spurious causal associations. Then, we leverage an Iterative Feedback and Refinement Module to improve the generated explanation by multi-aspect feedback. Extensive experiments on widely used WIKIWHY and e-CARE datasets show the superiority of our multi-agent framework in terms of reasoning about the causality among cause and effect.
2022
pdf
bib
abs
Extracting Trigger-sharing Events via an Event Matrix
Jun Xu
|
Weidi Xu
|
Mengshu Sun
|
Taifeng Wang
|
Wei Chu
Findings of the Association for Computational Linguistics: EMNLP 2022
A growing interest emerges in event extraction which aims to extract multiple events with triggers and arguments. Previous methods mitigate the problem of multiple events extraction by predicting the arguments conditioned on the event trigger and event type, assuming that these arguments belong to a single event. However, the assumption is invalid in general as there may be multiple events. Therefore, we present a unified framework called MatEE for trigger-sharing events extraction. It resolves the kernel bottleneck by effectively modeling the relations between arguments by an event matrix, where trigger-sharing events are represented by multiple cliques. We verify the proposed method on 3 widely-used benchmark datasets of event extraction. The experimental results show that it beats all the advanced competitors, significantly improving the state-of-the-art performances in event extraction.