Chih-Hao Lin


pdf bib
Nested Named Entity Recognition for Chinese Electronic Health Records with QA-based Sequence Labeling
Yu-Lun Chiang | Chih-Hao Lin | Cheng-Lung Sung | Keh-Yih Su
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

This study presents a novel QA-based sequence labeling (QASL) approach to naturally tackle both flat and nested Named Entity Recogntion (NER) tasks on a Chinese Electronic Health Records (CEHRs) dataset. This proposed QASL approach parallelly asks a corresponding natural language question for each specific named entity type, and then identifies those associated NEs of the same specified type with the BIO tagging scheme. The associated nested NEs are then formed by overlapping the results of various types. In comparison with those pure sequence-labeling (SL) approaches, since the given question includes significant prior knowledge about the specified entity type and the capability of extracting NEs with different types, the performance for nested NER task is thus improved, obtaining 90.70% of F1-score. Besides, in comparison with the pure QA-based approach, our proposed approach retains the SL features, which could extract multiple NEs with the same types without knowing the exact number of NEs in the same passage in advance. Eventually, experiments on our CEHR dataset demonstrate that QASL-based models greatly outperform the SL-based models by 6.12% to 7.14% of F1-score.