Distantly Supervised Named Entity Recognition via Confidence-Based Multi-Class Positive and Unlabeled Learning

Kang Zhou, Yuepei Li, Qi Li


Abstract
In this paper, we study the named entity recognition (NER) problem under distant supervision. Due to the incompleteness of the external dictionaries and/or knowledge bases, such distantly annotated training data usually suffer from a high false negative rate. To this end, we formulate the Distantly Supervised NER (DS-NER) problem via Multi-class Positive and Unlabeled (MPU) learning and propose a theoretically and practically novel CONFidence-based MPU (Conf-MPU) approach. To handle the incomplete annotations, Conf-MPU consists of two steps. First, a confidence score is estimated for each token of being an entity token. Then, the proposed Conf-MPU risk estimation is applied to train a multi-class classifier for the NER task. Thorough experiments on two benchmark datasets labeled by various external knowledge demonstrate the superiority of the proposed Conf-MPU over existing DS-NER methods. Our code is available at Github.
Anthology ID:
2022.acl-long.498
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7198–7211
Language:
URL:
https://aclanthology.org/2022.acl-long.498
DOI:
10.18653/v1/2022.acl-long.498
Bibkey:
Cite (ACL):
Kang Zhou, Yuepei Li, and Qi Li. 2022. Distantly Supervised Named Entity Recognition via Confidence-Based Multi-Class Positive and Unlabeled Learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7198–7211, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Distantly Supervised Named Entity Recognition via Confidence-Based Multi-Class Positive and Unlabeled Learning (Zhou et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.498.pdf
Software:
 2022.acl-long.498.software.zip
Video:
 https://aclanthology.org/2022.acl-long.498.mp4
Code
 kangISU/Conf-MPU-DS-NER
Data
BC5CDRCoNLL 2003