Rongsheng Li

2024

Depth Aware Hierarchical Replay Continual Learning for Knowledge Based Question Answering
Zhixiong Cao | Hai-Tao Zheng | Yangning Li | Jin Xu | Rongsheng Li | Hong-Gee Kim
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Continual learning is an emerging area of machine learning that deals with the issue where models adapt well to the latest data but lose the ability to remember past data due to changes in the data source. A widely adopted solution is by keeping a small memory of previous learned data that use replay. Most of the previous studies on continual learning focused on classification tasks, such as image classification and text classification, where the model needs only to categorize the input data. Inspired by the human ability to incrementally learn knowledge and solve different problems using learned knowledge, we considered a more pratical scenario, knowledge based quesiton answering about continual learning. In this scenario, each single question is different from others(means different fact trippes to answer them) while classification tasks only need to find feature boundaries of different categories, which are the curves or surfaces that separate different categories in the feature space. To address this issue, we proposed a depth aware hierarchical replay framework which include a tree structure classfier to have a sense of knowledge distribution and fill the gap between text classfication tasks and question-answering tasks for continual learning, a local sampler to grasp these critical samples and a depth aware learning network to reconstructe the feature space of a single learning round. In our experiments, we have demonstrated that our proposed model outperforms previous continual learning methods in mitigating the issue of catastrophic forgetting.

pdf bib abs

Training a Better Chinese Spelling Correction Model via Prior-knowledge Guided Teacher
Chi Wei | Shaobin Huang | Rongsheng Li | Naiyu Yan | Rui Wang
Findings of the Association for Computational Linguistics: ACL 2024

Recent advancements in Chinese Spelling Correction (CSC) predominantly leverage pre-trained language models (PLMs). However, a notable challenge with fine-tuned PLM-based CSC models is their tendency to over-correct, leading to poor generalization for error patterns outside the standard distribution. To address this, we developed a teacher network guided by prior knowledge for distillation learning of CSC models. Unlike traditional teacher networks, which depend on task-related pre-training, our method infuses task-related prior information into the teacher network, offering guidance beyond mere labels to the student network. This strategy significantly enhances the CSC model’s language modeling capabilities, crucial for minimizing over-correction. Importantly, our approach is model-independent and the teacher network does not require task-related pre-training, making it broadly applicable for enhancing various PLM-based CSC models with minimal additional computational resources. Extensive experiments on widely used benchmarks demonstrate that our method achieves new state-of-the-art results. Additionally, we explored the potential of generalizing our method to other non-autoregressive text-generation tasks.

pdf bib abs

Metaphor Detection with Context Enhancement and Curriculum Learning
Kaidi Jia | Rongsheng Li
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Metaphor detection is a challenging task for natural language processing (NLP) systems. Previous works failed to sufficiently utilize the internal and external semantic relationships between target words and their context. Furthermore, they have faced challenges in tackling the problem of data sparseness due to the very limited available training data. To address these two challenges, we propose a novel model called MiceCL. By leveraging the difference between the literal meaning of the target word and the meaning of the sentence as the sentence external difference, MiceCL can better handle the semantic relationships. Additionally, we propose a curriculum learning framework for automatically assessing difficulty of the sentence with a pre-trained model. By starting from easy examples and gradually progressing to more difficult ones, we can ensure that the model will not deal with complex data when its ability is weak so that to avoid wasting limited data. Experimental results demonstrate that MiceCL achieves competitive performance across multiple datasets, with a significantly improved convergence speed compared to other models.

Co-authors

Rui Wang 1

Chi Wei 1

Jin Xu 1

Naiyu Yan 1

Hai-Tao Zheng 1

Venues

Fix author