A Comprehensive Evaluation of Biomedical Entity-centric Search

Elena Tutubalina, Zulfat Miftahutdinov, Vladimir Muravlev, Anastasia Shneyderman


Abstract
Biomedical information retrieval has often been studied as a task of detecting whether a system correctly detects entity spans and links these entities to concepts from a given terminology. Most academic research has focused on evaluation of named entity recognition (NER) and entity linking (EL) models which are key components to recognizing diseases and genes in PubMed abstracts. In this work, we perform a fine-grained evaluation intended to understand the efficiency of state-of-the-art BERT-based information extraction (IE) architecture as a biomedical search engine. We present a novel manually annotated dataset of abstracts for disease and gene search. The dataset contains 23K query-abstract pairs, where 152 queries are selected from logs of our target discovery platform and PubMed abstracts annotated with relevance judgments. Specifically, the query list also includes a subset of concepts with at least one ambiguous concept name. As a baseline, we use off-she-shelf Elasticsearch with BM25. Our experiments on NER, EL, and retrieval in a zero-shot setup show the neural IE architecture shows superior performance for both disease and gene concept queries.
Anthology ID:
2022.emnlp-industry.61
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
December
Year:
2022
Address:
Abu Dhabi, UAE
Editors:
Yunyao Li, Angeliki Lazaridou
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
596–605
Language:
URL:
https://aclanthology.org/2022.emnlp-industry.61
DOI:
10.18653/v1/2022.emnlp-industry.61
Bibkey:
Cite (ACL):
Elena Tutubalina, Zulfat Miftahutdinov, Vladimir Muravlev, and Anastasia Shneyderman. 2022. A Comprehensive Evaluation of Biomedical Entity-centric Search. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 596–605, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
A Comprehensive Evaluation of Biomedical Entity-centric Search (Tutubalina et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-industry.61.pdf