Dongsu Shen


2024

pdf bib
WkNER: Enhancing Named Entity Recognition with Word Segmentation Constraints and kNN Retrieval
Yanchun Li | Senlin Deng | Dongsu Shen | Shujuan Tian | Saiqin Long
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Fine-tuning Pre-trained Language Models (PLMs) is a popular Natural Language Processing (NLP) paradigm for addressing Named Entity Recognition (NER) tasks. However, neural network models often demonstrate poor generalization capabilities due to significant disparities between the knowledge learned by PLMs and the distribution of the target dataset, as well as data scarcity issues. In addition, token omission in predictions due to insufficient learning remains a challenge in NER. In this paper, we propose a kNN retrieval enhancement algorithm (WkNER) that incorporates word segmentation information to enhance the model’s generalization ability and alleviate the problem of missing entity tokens in prediction. The introduction of word segmentation information is used to preliminarily determine the boundaries of entities and alleviate the common prediction errors of missing tokens within entities made by the fine-tuned model. Secondly, we find that non-entities in the retrieval table contain a large amount of redundant information, and explore the effects of introducing non-entity information of different scales on the model. Experimental results show that our proposed method significantly improves the performance of baseline models, and achieves better or compared recognition accuracy than previous state-of-the-art models in multiple public Chinese and English datasets. Especially in low-resource scenarios, our method achieves higher accuracy on 20% of the dataset than the original method on the full dataset.