Alignment Precedes Fusion: Open-Vocabulary Named Entity Recognition as Context-Type Semantic Matching

Zhuoran Jin; Pengfei Cao; Zhitao He; Yubo Chen (陈玉博); Kang Liu; Jun Zhao

doi:10.18653/v1/2023.findings-emnlp.974

Alignment Precedes Fusion: Open-Vocabulary Named Entity Recognition as Context-Type Semantic Matching

Zhuoran Jin, Pengfei Cao, Zhitao He, Yubo Chen, Kang Liu, Jun Zhao

Abstract

Despite the significant progress in developing named entity recognition models, scaling to novel-emerging types still remains challenging in real-world scenarios. Continual learning and zero-shot learning approaches have been explored to handle novel-emerging types with less human supervision, but they have not been as successfully adopted as supervised approaches. Meanwhile, humans possess a much larger vocabulary size than these approaches and have the ability to learn the alignment between entities and concepts effortlessly through natural supervision. In this paper, we consider a more realistic and challenging setting called open-vocabulary named entity recognition (OVNER) to imitate human-level ability. OVNER aims to recognize entities in novel types by their textual names or descriptions. Specifically, we formulate OVNER as a semantic matching task and propose a novel and scalable two-stage method called Context-Type SemAntiC Alignment and FusiOn (CACAO). In the pre-training stage, we adopt Dual-Encoder for context-type semantic alignment and pre-train Dual-Encoder on 80M context-type pairs which are easily accessible through natural supervision. In the fine-tuning stage, we use Cross-Encoder for context-type semantic fusion and fine-tune Cross-Encoder on base types with human supervision. Experimental results show that our method outperforms the previous state-of-the-art methods on three challenging OVNER benchmarks by 9.7%, 9.5%, and 1.8% F1-score of novel types. Moreover, CACAO also demonstrates its flexible transfer ability in cross-domain NER.

Anthology ID:: 2023.findings-emnlp.974
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2023
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14616–14637
Language:
URL:: https://aclanthology.org/2023.findings-emnlp.974/
DOI:: 10.18653/v1/2023.findings-emnlp.974
Bibkey:
Cite (ACL):: Zhuoran Jin, Pengfei Cao, Zhitao He, Yubo Chen, Kang Liu, and Jun Zhao. 2023. Alignment Precedes Fusion: Open-Vocabulary Named Entity Recognition as Context-Type Semantic Matching. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 14616–14637, Singapore. Association for Computational Linguistics.
Cite (Informal):: Alignment Precedes Fusion: Open-Vocabulary Named Entity Recognition as Context-Type Semantic Matching (Jin et al., Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-emnlp.974.pdf

PDF Cite Search Fix data