2024
pdf
bib
abs
Distill, Fuse, Pre-train: Towards Effective Event Causality Identification with Commonsense-Aware Pre-trained Model
Peixin Huang
|
Xiang Zhao
|
Minghao Hu
|
Zhen Tan
|
Weidong Xiao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Event Causality Identification (ECI) aims to detect causal relations between events in unstructured texts. This task is challenged by the lack of data and explicit causal clues. Some methods incorporate explicit knowledge from external knowledge graphs (KGs) into Pre-trained Language Models (PLMs) to tackle these issues, achieving certain accomplishments. However, they ignore that existing KGs usually contain trivial knowledge which may prejudice the performance. Moreover, they simply integrate the concept triplets, underutilizing the deep interaction between the text and external graph. In this paper, we propose an effective pipeline DFP, i.e., Distill, Fuse and Pre-train, to build a commonsense-aware pre-trained model which integrates reliable task-specific knowledge from commonsense graphs. This pipeline works as follows: (1) To leverage the reliable knowledge, commonsense graph distillation is proposed to distill commonsense graphs and obtain the meta-graph which contain credible task-oriented knowledge. (2) To model the deep interaction between the text and external graph, heterogeneous information fusion is proposed to fuse them through a commonsense-aware memory network. (3) Continual pre-training designs three continual pre-training tasks to further align and fuse the text and the commonsense meta-graph. Through extensive experiments on two benchmarks, we demonstrate the validity of our pipeline.
pdf
bib
abs
SCL: Selective Contrastive Learning for Data-driven Zero-shot Relation Extraction
Ning Pang
|
Xiang Zhao
|
Weixin Zeng
|
Zhen Tan
|
Weidong Xiao
Transactions of the Association for Computational Linguistics, Volume 12
Relation extraction has evolved from supervised relation extraction to zero-shot setting due to the continuous emergence of newly generated relations. Some pioneering works handle zero-shot relation extraction by reformulating it into proxy tasks, such as reading comprehension and textual entailment. Nonetheless, the divergence in proxy task formulations from relation extraction hinders the acquisition of informative semantic representations, leading to subpar performance. Therefore, in this paper, we take a data-driven view to handle zero-shot relation extraction under a three-step paradigm, including encoder training, relation clustering, and summarization. Specifically, to train a discriminative relational encoder, we propose a novel selective contrastive learning framework, namely, SCL, where selective importance scores are assigned to distinguish the importance of different negative contrastive instances. During testing, the prompt-based encoder is employed to map test samples into representation vectors, which are then clustered into several groups. Typical samples closest to the cluster centroid are selected for summarization to generate the predicted relation for all samples in the cluster. Moreover, we design a simple non-parametric threshold plugin to reduce false-positive errors in inference on unseen relation representations. Our experiments demonstrate that SCL outperforms the current state-of-the-art method by over 3% across all metrics.
2023
pdf
bib
abs
Towards Better Representations for Multi-Label Text Classification with Multi-granularity Information
Fangfang Li
|
Puzhen Su
|
Junwen Duan
|
Weidong Xiao
Findings of the Association for Computational Linguistics: EMNLP 2023
Multi-label text classification (MLTC) aims to assign multiple labels to a given text. Previous works have focused on text representation learning and label correlations modeling using pre-trained language models (PLMs). However, studies have shown that PLMs generate word frequency-oriented text representations, causing texts with different labels to be closely distributed in a narrow region, which is difficult to classify. To address this, we present a novel framework CL( ̲Contrastive ̲Learning)-MIL ( ̲Multi-granularity ̲Information ̲Learning) to refine the text representation for MLTC task. We first use contrastive learning to generate uniform initial text representation and incorporate label frequency implicitly. Then, we design a multi-task learning module to integrate multi-granularity (diverse text-labels correlations, label-label relations and label frequency) information into text representations, enhancing their discriminative ability. Experimental results demonstrate the complementarity of the modules in CL-MIL, improving the quality of text representations and yielding stable and competitive improvements for MLTC.
pdf
bib
abs
T 2 -NER: A Two-Stage Span-Based Framework for Unified Named Entity Recognition with Templates
Peixin Huang
|
Xiang Zhao
|
Minghao Hu
|
Zhen Tan
|
Weidong Xiao
Transactions of the Association for Computational Linguistics, Volume 11
Named Entity Recognition (NER) has so far evolved from the traditional flat NER to overlapped and discontinuous NER. They have mostly been solved separately, with only several exceptions that concurrently tackle three tasks with a single model. The current best-performing method formalizes the unified NER as word-word relation classification, which barely focuses on mention content learning and fails to detect entity mentions comprising a single word. In this paper, we propose a two-stage span-based framework with templates, namely, T2-NER, to resolve the unified NER task. The first stage is to extract entity spans, where flat and overlapped entities can be recognized. The second stage is to classify over all entity span pairs, where discontinuous entities can be recognized. Finally, multi-task learning is used to jointly train two stages. To improve the efficiency of span-based model, we design grouped templates and typed templates for two stages to realize batch computations. We also apply an adjacent packing strategy and a latter packing strategy to model discriminative boundary information and learn better span (pair) representation. Moreover, we introduce the syntax information to enhance our span representation. We perform extensive experiments on eight benchmark datasets for flat, overlapped, and discontinuous NER, where our model beats all the current competitive baselines, obtaining the best performance of unified NER.
2022
pdf
bib
abs
Extract-Select: A Span Selection Framework for Nested Named Entity Recognition with Generative Adversarial Training
Peixin Huang
|
Xiang Zhao
|
Minghao Hu
|
Yang Fang
|
Xinyi Li
|
Weidong Xiao
Findings of the Association for Computational Linguistics: ACL 2022
Nested named entity recognition (NER) is a task in which named entities may overlap with each other. Span-based approaches regard nested NER as a two-stage span enumeration and classification task, thus having the innate ability to handle this task. However, they face the problems of error propagation, ignorance of span boundary, difficulty in long entity recognition and requirement on large-scale annotated data. In this paper, we propose Extract-Select, a span selection framework for nested NER, to tackle these problems. Firstly, we introduce a span selection framework in which nested entities with different input categories would be separately extracted by the extractor, thus naturally avoiding error propagation in two-stage span-based approaches. In the inference phase, the trained extractor selects final results specific to the given entity category. Secondly, we propose a hybrid selection strategy in the extractor, which not only makes full use of span boundary but also improves the ability of long entity recognition. Thirdly, we design a discriminator to evaluate the extraction result, and train both extractor and discriminator with generative adversarial training (GAT). The use of GAT greatly alleviates the stress on the dataset size. Experimental results on four benchmark datasets demonstrate that Extract-Select outperforms competitive nested NER models, obtaining state-of-the-art results. The proposed model also performs well when less labeled data are given, proving the effectiveness of GAT.
2021
pdf
bib
abs
Relation-aware Bidirectional Path Reasoning for Commonsense Question Answering
Junxing Wang
|
Xinyi Li
|
Zhen Tan
|
Xiang Zhao
|
Weidong Xiao
Proceedings of the 25th Conference on Computational Natural Language Learning
Commonsense Question Answering is an important natural language processing (NLP) task that aims to predict the correct answer to a question through commonsense reasoning. Previous studies utilize pre-trained models on large-scale corpora such as BERT, or perform reasoning on knowledge graphs. However, these methods do not explicitly model the relations that connect entities, which are informational and can be used to enhance reasoning. To address this issue, we propose a relation-aware reasoning method. Our method uses a relation-aware graph neural network to capture the rich contextual information from both entities and relations. Compared with methods that use fixed relation embeddings from pre-trained models, our model dynamically updates relations with contextual information from a multi-source subgraph, built from multiple external knowledge sources. The enhanced representations of relations are then fed to a bidirectional reasoning module. A bidirectional attention mechanism is applied between the question sequence and the paths that connect entities, which provides us with transparent interpretability. Experimental results on the CommonsenseQA dataset illustrate that our method results in significant improvements over the baselines while also providing clear reasoning paths.
2020
pdf
bib
abs
Joint Event Extraction with Hierarchical Policy Network
Peixin Huang
|
Xiang Zhao
|
Ryuichi Takanobu
|
Zhen Tan
|
Weidong Xiao
Proceedings of the 28th International Conference on Computational Linguistics
Most existing work on event extraction (EE) either follows a pipelined manner or uses a joint structure but is pipelined in essence. As a result, these efforts fail to utilize information interactions among event triggers, event arguments, and argument roles, which causes information redundancy. In view of this, we propose to exploit the role information of the arguments in an event and devise a Hierarchical Policy Network (HPNet) to perform joint EE. The whole EE process is fulfilled through a two-level hierarchical structure consisting of two policy networks for event detection and argument detection. The deep information interactions among the subtasks are realized, and it is more natural to deal with multiple events issue. Extensive experiments on ACE2005 and TAC2015 demonstrate the superiority of HPNet, leading to state-of-the-art performance and is more powerful for sentences with multiple events.