FuzzyBIO: A Proposal for Fuzzy Representation of Discontinuous Entities

Anne Dirkson, Suzan Verberne, Wessel Kraaij


Abstract
Discontinuous entities pose a challenge to named entity recognition (NER). These phenomena occur commonly in the biomedical domain. As a solution, expansions of the BIO representation scheme that can handle these entity types are commonly used (i.e. BIOHD). However, the extra tag types make the NER task more difficult to learn. In this paper we propose an alternative; a fuzzy continuous BIO scheme (FuzzyBIO). We focus on the task of Adverse Drug Response extraction and normalization to compare FuzzyBIO to BIOHD. We find that FuzzyBIO improves recall of NER for two of three data sets and results in a higher percentage of correctly identified disjoint and composite entities for all data sets. Using FuzzyBIO also improves end-to-end performance for continuous and composite entities in two of three data sets. Since FuzzyBIO improves performance for some data sets and the conversion from BIOHD to FuzzyBIO is straightforward, we recommend investigating which is more effective for any data set containing discontinuous entities.
Anthology ID:
2021.louhi-1.9
Volume:
Proceedings of the 12th International Workshop on Health Text Mining and Information Analysis
Month:
April
Year:
2021
Address:
online
Venues:
EACL | Louhi
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
77–82
Language:
URL:
https://aclanthology.org/2021.louhi-1.9
DOI:
Bibkey:
Cite (ACL):
Anne Dirkson, Suzan Verberne, and Wessel Kraaij. 2021. FuzzyBIO: A Proposal for Fuzzy Representation of Discontinuous Entities. In Proceedings of the 12th International Workshop on Health Text Mining and Information Analysis, pages 77–82, online. Association for Computational Linguistics.
Cite (Informal):
FuzzyBIO: A Proposal for Fuzzy Representation of Discontinuous Entities (Dirkson et al., Louhi 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.louhi-1.9.pdf
Code
 AnneDirkson/FuzzyBIO