Clustering-based Inference for Biomedical Entity Linking

Rico Angell, Nicholas Monath, Sunil Mohan, Nishant Yadav, Andrew McCallum


Abstract
Due to large number of entities in biomedical knowledge bases, only a small fraction of entities have corresponding labelled training data. This necessitates entity linking models which are able to link mentions of unseen entities using learned representations of entities. Previous approaches link each mention independently, ignoring the relationships within and across documents between the entity mentions. These relations can be very useful for linking mentions in biomedical text where linking decisions are often difficult due mentions having a generic or a highly specialized form. In this paper, we introduce a model in which linking decisions can be made not merely by linking to a knowledge base entity but also by grouping multiple mentions together via clustering and jointly making linking predictions. In experiments on the largest publicly available biomedical dataset, we improve the best independent prediction for entity linking by 3.0 points of accuracy, and our clustering-based inference model further improves entity linking by 2.3 points.
Anthology ID:
2021.naacl-main.205
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Editors:
Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2598–2608
Language:
URL:
https://aclanthology.org/2021.naacl-main.205
DOI:
10.18653/v1/2021.naacl-main.205
Bibkey:
Cite (ACL):
Rico Angell, Nicholas Monath, Sunil Mohan, Nishant Yadav, and Andrew McCallum. 2021. Clustering-based Inference for Biomedical Entity Linking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2598–2608, Online. Association for Computational Linguistics.
Cite (Informal):
Clustering-based Inference for Biomedical Entity Linking (Angell et al., NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.205.pdf
Video:
 https://aclanthology.org/2021.naacl-main.205.mp4
Data
MedMentions