Nested Named Entity Recognition for Chinese Electronic Health Records with QA-based Sequence Labeling

Yu-Lun Chiang, Chih-Hao Lin, Cheng-Lung Sung, Keh-Yih Su


Abstract
This study presents a novel QA-based sequence labeling (QASL) approach to naturally tackle both flat and nested Named Entity Recogntion (NER) tasks on a Chinese Electronic Health Records (CEHRs) dataset. This proposed QASL approach parallelly asks a corresponding natural language question for each specific named entity type, and then identifies those associated NEs of the same specified type with the BIO tagging scheme. The associated nested NEs are then formed by overlapping the results of various types. In comparison with those pure sequence-labeling (SL) approaches, since the given question includes significant prior knowledge about the specified entity type and the capability of extracting NEs with different types, the performance for nested NER task is thus improved, obtaining 90.70% of F1-score. Besides, in comparison with the pure QA-based approach, our proposed approach retains the SL features, which could extract multiple NEs with the same types without knowing the exact number of NEs in the same passage in advance. Eventually, experiments on our CEHR dataset demonstrate that QASL-based models greatly outperform the SL-based models by 6.12% to 7.14% of F1-score.
Anthology ID:
2021.rocling-1.3
Volume:
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)
Month:
October
Year:
2021
Address:
Taoyuan, Taiwan
Venue:
ROCLING
SIG:
Publisher:
The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Note:
Pages:
18–25
Language:
URL:
https://aclanthology.org/2021.rocling-1.3
DOI:
Bibkey:
Cite (ACL):
Yu-Lun Chiang, Chih-Hao Lin, Cheng-Lung Sung, and Keh-Yih Su. 2021. Nested Named Entity Recognition for Chinese Electronic Health Records with QA-based Sequence Labeling. In Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021), pages 18–25, Taoyuan, Taiwan. The Association for Computational Linguistics and Chinese Language Processing (ACLCLP).
Cite (Informal):
Nested Named Entity Recognition for Chinese Electronic Health Records with QA-based Sequence Labeling (Chiang et al., ROCLING 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.rocling-1.3.pdf