2024
pdf
bib
abs
Incremental Sequence Labeling: A Tale of Two Shifts
Shengjie Qiu
|
Junhao Zheng
|
Zhen Liu
|
Yicheng Luo
|
Qianli Ma
Findings of the Association for Computational Linguistics: ACL 2024
The incremental sequence labeling task involves continuously learning new classes over time while retaining knowledge of the previous ones. Our investigation identifies two significant semantic shifts: E2O (where the model mislabels an old entity as a non-entity) and O2E (where the model labels a non-entity or old entity as a new entity). Previous research has predominantly focused on addressing the E2O problem, neglecting the O2E issue. This negligence results in a model bias towards classifying new data samples as belonging to the new class during the learning process. To address these challenges, we propose a novel framework, Incremental Sequential Labeling without Semantic Shifts (IS3). Motivated by the identified semantic shifts (E2O and O2E), IS3 aims to mitigate catastrophic forgetting in models. As for the E2O problem, we use knowledge distillation to maintain the model’s discriminative ability for old entities. Simultaneously, to tackle the O2E problem, we alleviate the model’s bias towards new entities through debiased loss and optimization levels.Our experimental evaluation, conducted on three datasets with various incremental settings, demonstrates the superior performance of IS3 compared to the previous state-of-the-art method by a significant margin.
pdf
bib
abs
Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models
Junhao Zheng
|
Shengjie Qiu
|
Qianli Ma
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Incremental Learning (IL) has been a long-standing problem in both vision and Natural Language Processing (NLP) communities.In recent years, as Pre-trained Language Models (PLMs) have achieved remarkable progress in various NLP downstream tasks, utilizing PLMs as backbones has become a common practice in recent research of IL in NLP.Most assume that catastrophic forgetting is the biggest obstacle to achieving superior IL performance and propose various techniques to overcome this issue.However, we find that this assumption is problematic.Specifically, we revisit more than 20 methods on four classification tasks (Text Classification, Intent Classification, Relation Extraction, and Named Entity Recognition) under the two most popular IL settings (Class-Incremental and Task-Incremental) and reveal that most of them severely underestimate the inherent anti-forgetting ability of PLMs.Based on the observation, we propose a frustratingly easy method called SEQ* for IL with PLMs.The results show that SEQ* has competitive or superior performance compared with state-of-the-art (SOTA) IL methods yet requires considerably less trainable parameters and training time.These findings urge us to revisit the IL with PLMs and encourage future studies to have a fundamental understanding of the catastrophic forgetting in PLMs.
pdf
bib
abs
Well Begun Is Half Done: An Implicitly Augmented Generative Framework with Distribution Modification for Hierarchical Text Classification
Huawen Feng
|
Jingsong Yan
|
Junlong Liu
|
Junhao Zheng
|
Qianli Ma
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Hierarchical Text Classification (HTC) is a challenging task which aims to extract the labels in a tree structure corresponding to a given text. Discriminative methods usually incorporate the hierarchical structure information into the encoding process, while generative methods decode the features according to it. However, the data distribution varies widely among different categories of samples, but current methods ignore the data imbalance, making the predictions biased and susceptible to error propagation. In this paper, we propose an **IM**plicitly **A**ugmented **G**enerativ **E** framework with distribution modification for hierarchical text classification (**IMAGE**). Specifically, we translate the distributions of original samples along various directions through implicit augmentation to get more diverse data. Furthermore, given the scarcity of the samples of tail classes, we adjust their distributions by transferring knowledge from other classes in label space. In this way, the generative framework learns a better beginning of the feature sequence without a prediction bias and avoids being misled by its wrong predictions for head classes. Experimental results show that **IMAGE** obtains competitive results compared with state-of-the-art methods and prove its superiority on unbalanced data.
2023
pdf
bib
abs
Joint Constrained Learning with Boundary-adjusting for Emotion-Cause Pair Extraction
Huawen Feng
|
Junlong Liu
|
Junhao Zheng
|
Haibin Chen
|
Xichen Shang
|
Qianli Ma
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Emotion-Cause Pair Extraction (ECPE) aims to identify the document’s emotion clauses and corresponding cause clauses. Like other relation extraction tasks, ECPE is closely associated with the relationship between sentences. Recent methods based on Graph Convolutional Networks focus on how to model the multiplex relations between clauses by constructing different edges. However, the data of emotions, causes, and pairs are extremely unbalanced, and current methods get their representation using the same graph structure. In this paper, we propose a **J**oint **C**onstrained Learning framework with **B**oundary-adjusting for Emotion-Cause Pair Extraction (**JCB**). Specifically, through constrained learning, we summarize the prior rules existing in the data and force the model to take them into consideration in optimization, which helps the model learn a better representation from unbalanced data. Furthermore, we adjust the decision boundary of classifiers according to the relations between subtasks, which have always been ignored. No longer working independently as in the previous framework, the classifiers corresponding to three subtasks cooperate under the relation constraints. Experimental results show that **JCB** obtains competitive results compared with state-of-the-art methods and prove its robustness on unbalanced data.
pdf
bib
abs
Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference
Junhao Zheng
|
Qianli Ma
|
Shengjie Qiu
|
Yue Wu
|
Peitian Ma
|
Junlong Liu
|
Huawen Feng
|
Xichen Shang
|
Haibin Chen
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Fine-tuning has been proven to be a simple and effective technique to transfer the learned knowledge of Pre-trained Language Models (PLMs) to downstream tasks. However, vanilla fine-tuning easily overfits the target data and degrades the generalization ability. Most existing studies attribute it to catastrophic forgetting, and they retain the pre-trained knowledge indiscriminately without identifying what knowledge is transferable. Motivated by this, we frame fine-tuning into a causal graph and discover that the crux of catastrophic forgetting lies in the missing causal effects from the pre-trained data. Based on the causal view, we propose a unified objective for fine-tuning to retrieve the causality back. Intriguingly, the unified objective can be seen as the sum of the vanilla fine-tuning objective, which learns new knowledge from target data, and the causal objective, which preserves old knowledge from PLMs. Therefore, our method is flexible and can mitigate negative transfer while preserving knowledge. Since endowing models with commonsense is a long-standing challenge, we implement our method on commonsense QA with a proposed heuristic estimation to verify its effectiveness. In the experiments, our method outperforms state-of-the-art fine-tuning methods on all six commonsense QA datasets and can be implemented as a plug-in module to inflate the performance of existing QA models.
2022
pdf
bib
abs
Distilling Causal Effect from Miscellaneous Other-Class for Continual Named Entity Recognition
Junhao Zheng
|
Zhanxian Liang
|
Haibin Chen
|
Qianli Ma
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Continual Learning for Named Entity Recognition (CL-NER) aims to learn a growing number of entity types over time from a stream of data. However, simply learning Other-Class in the same way as new entity types amplifies the catastrophic forgetting and leads to a substantial performance drop. The main cause behind this is that Other-Class samples usually contain old entity types, and the old knowledge in these Other-Class samples is not preserved properly. Thanks to the causal inference, we identify that the forgetting is caused by the missing causal effect from the old data.To this end, we propose a unified causal framework to retrieve the causality from both new entity types and Other-Class.Furthermore, we apply curriculum learning to mitigate the impact of label noise and introduce a self-adaptive weight for balancing the causal effects between new entity types and Other-Class. Experimental results on three benchmark datasets show that our method outperforms the state-of-the-art method by a large margin. Moreover, our method can be combined with the existing state-of-the-art methods to improve the performance in CL-NER.
pdf
bib
abs
Cross-domain Named Entity Recognition via Graph Matching
Junhao Zheng
|
Haibin Chen
|
Qianli Ma
Findings of the Association for Computational Linguistics: ACL 2022
Cross-domain NER is a practical yet challenging problem since the data scarcity in the real-world scenario. A common practice is first to learn a NER model in a rich-resource general domain and then adapt the model to specific domains. Due to the mismatch problem between entity types across domains, the wide knowledge in the general domain can not effectively transfer to the target domain NER model. To this end, we model the label relationship as a probability distribution and construct label graphs in both source and target label spaces. To enhance the contextual representation with label structures, we fuse the label graph into the word embedding output by BERT. By representing label relationships as graphs, we formulate cross-domain NER as a graph matching problem. Furthermore, the proposed method has good applicability with pre-training methods and is potentially capable of other cross-domain prediction tasks. Empirical results on four datasets show that our method outperforms a series of transfer learning, multi-task learning, and few-shot learning methods.