UseClean: learning from complex noisy labels in named entity recognition

Jinjin Tian; Kun Zhou; Meiguo Wang; Yu Zhang; Benjamin Yao; Xiaohu Liu; Chenlei Guo

UseClean: learning from complex noisy labels in named entity recognition

Jinjin Tian, Kun Zhou, Meiguo Wang, Yu Zhang, Benjamin Yao, Xiaohu Liu, Chenlei Guo

Abstract

We investigate and refine denoising methods for NER task on data that potentially contains extremely noisy labels from multi-sources. In this paper, we first summarized all possible noise types and noise generation schemes, based on which we built a thorough evaluation system. We then pinpoint the bottleneck of current state-of-art denoising methods using our evaluation system. Correspondingly, we propose several refinements, including using a two-stage framework to avoid error accumulation; a novel confidence score utilizing minimal clean supervision to increase predictive power; an automatic cutoff fitting to save extensive hyper-parameter tuning; a warm started weighted partial CRF to better learn on the noisy tokens. Additionally, we propose to use adaptive sampling to further boost the performance in long-tailed entity settings. Our method improves F1 score by on average at least 5 10% over current state-of-art across extensive experiments.

Anthology ID:: 2023.clasp-1.14
Volume:: Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD)
Month:: September
Year:: 2023
Address:: Gothenburg, Sweden
Editors:: Ellen Breitholtz, Shalom Lappin, Sharid Loaiciga, Nikolai Ilinykh, Simon Dobnik
Venue:: CLASP
SIG:: SIGSEM
Publisher:: Association for Computational Linguistics
Note:
Pages:: 120–130
Language:
URL:: https://aclanthology.org/2023.clasp-1.14/
DOI:
Bibkey:
Cite (ACL):: Jinjin Tian, Kun Zhou, Meiguo Wang, Yu Zhang, Benjamin Yao, Xiaohu Liu, and Chenlei Guo. 2023. UseClean: learning from complex noisy labels in named entity recognition. In Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD), pages 120–130, Gothenburg, Sweden. Association for Computational Linguistics.
Cite (Informal):: UseClean: learning from complex noisy labels in named entity recognition (Tian et al., CLASP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.clasp-1.14.pdf
Optionalsupplementarymaterial:: 2023.clasp-1.14.OptionalSupplementaryMaterial.pdf

PDF Cite Search Optionalsupplementarymaterial Fix data