Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text

Maya Varma; Laurel Orr; Sen Wu; Megan Leszczynski; Xiao Ling; Christopher Ré

doi:10.18653/v1/2021.findings-emnlp.388

Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text

Maya Varma, Laurel Orr, Sen Wu, Megan Leszczynski, Xiao Ling, Christopher Ré

Abstract

Named entity disambiguation (NED), which involves mapping textual mentions to structured entities, is particularly challenging in the medical domain due to the presence of rare entities. Existing approaches are limited by the presence of coarse-grained structural resources in biomedical knowledge bases as well as the use of training datasets that provide low coverage over uncommon resources. In this work, we address these issues by proposing a cross-domain data integration method that transfers structural knowledge from a general text knowledge base to the medical domain. We utilize our integration scheme to augment structural resources and generate a large biomedical NED dataset for pretraining. Our pretrained model with injected structural knowledge achieves state-of-the-art performance on two benchmark medical NED datasets: MedMentions and BC5CDR. Furthermore, we improve disambiguation of rare entities by up to 57 accuracy points.

Anthology ID:: 2021.findings-emnlp.388
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2021
Month:: November
Year:: 2021
Address:: Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: Findings
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4566–4575
Language:
URL:: https://aclanthology.org/2021.findings-emnlp.388/
DOI:: 10.18653/v1/2021.findings-emnlp.388
Bibkey:
Cite (ACL):: Maya Varma, Laurel Orr, Sen Wu, Megan Leszczynski, Xiao Ling, and Christopher Ré. 2021. Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4566–4575, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text (Varma et al., Findings 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.findings-emnlp.388.pdf
Video:: https://aclanthology.org/2021.findings-emnlp.388.mp4

PDF Cite Search Video Fix data