Coarse-to-Fine Pre-training for Named Entity Recognition

Xue Mengge; Bowen Yu; Zhenyu Zhang; Tingwen Liu; Yue Zhang; Bin Wang

doi:10.18653/v1/2020.emnlp-main.514

Coarse-to-Fine Pre-training for Named Entity Recognition

Xue Mengge, Bowen Yu, Zhenyu Zhang, Tingwen Liu, Yue Zhang, Bin Wang

Abstract

More recently, Named Entity Recognition hasachieved great advances aided by pre-trainingapproaches such as BERT. However, currentpre-training techniques focus on building lan-guage modeling objectives to learn a gen-eral representation, ignoring the named entity-related knowledge. To this end, we proposea NER-specific pre-training framework to in-ject coarse-to-fine automatically mined entityknowledge into pre-trained models. Specifi-cally, we first warm-up the model via an en-tity span identification task by training it withWikipedia anchors, which can be deemed asgeneral-typed entities. Then we leverage thegazetteer-based distant supervision strategy totrain the model extract coarse-grained typedentities. Finally, we devise a self-supervisedauxiliary task to mine the fine-grained namedentity knowledge via clustering.Empiricalstudies on three public NER datasets demon-strate that our framework achieves significantimprovements against several pre-trained base-lines, establishing the new state-of-the-art per-formance on three benchmarks. Besides, weshow that our framework gains promising re-sults without using human-labeled trainingdata, demonstrating its effectiveness in label-few and low-resource scenarios.

Anthology ID:: 2020.emnlp-main.514
Volume:: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:: November
Year:: 2020
Address:: Online
Editors:: Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6345–6354
Language:
URL:: https://aclanthology.org/2020.emnlp-main.514
DOI:: 10.18653/v1/2020.emnlp-main.514
Bibkey:
Cite (ACL):: Xue Mengge, Bowen Yu, Zhenyu Zhang, Tingwen Liu, Yue Zhang, and Bin Wang. 2020. Coarse-to-Fine Pre-training for Named Entity Recognition. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6345–6354, Online. Association for Computational Linguistics.
Cite (Informal):: Coarse-to-Fine Pre-training for Named Entity Recognition (Mengge et al., EMNLP 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.emnlp-main.514.pdf
Video:: https://slideslive.com/38938977
Code: strawberryx/CoFEE

PDF Cite Search Code Video