Extensive Error Analysis and a Learning-Based Evaluation of Medical Entity Recognition Systems to Approximate User Experience

Isar Nejadgholi, Kathleen C. Fraser, Berry de Bruijn


Abstract
When comparing entities extracted by a medical entity recognition system with gold standard annotations over a test set, two types of mismatches might occur, label mismatch or span mismatch. Here we focus on span mismatch and show that its severity can vary from a serious error to a fully acceptable entity extraction due to the subjectivity of span annotations. For a domain-specific BERT-based NER system, we showed that 25% of the errors have the same labels and overlapping span with gold standard entities. We collected expert judgement which shows more than 90% of these mismatches are accepted or partially accepted by the user. Using the training set of the NER system, we built a fast and lightweight entity classifier to approximate the user experience of such mismatches through accepting or rejecting them. The decisions made by this classifier are used to calculate a learning-based F-score which is shown to be a better approximation of a forgiving user’s experience than the relaxed F-score. We demonstrated the results of applying the proposed evaluation metric for a variety of deep learning medical entity recognition models trained with two datasets.
Anthology ID:
2020.bionlp-1.19
Volume:
Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing
Month:
July
Year:
2020
Address:
Online
Editors:
Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Venue:
BioNLP
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
177–186
Language:
URL:
https://aclanthology.org/2020.bionlp-1.19
DOI:
10.18653/v1/2020.bionlp-1.19
Bibkey:
Cite (ACL):
Isar Nejadgholi, Kathleen C. Fraser, and Berry de Bruijn. 2020. Extensive Error Analysis and a Learning-Based Evaluation of Medical Entity Recognition Systems to Approximate User Experience. In Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing, pages 177–186, Online. Association for Computational Linguistics.
Cite (Informal):
Extensive Error Analysis and a Learning-Based Evaluation of Medical Entity Recognition Systems to Approximate User Experience (Nejadgholi et al., BioNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.bionlp-1.19.pdf
Data
MedMentions