WikiUMLS: Aligning UMLS to Wikipedia via Cross-lingual Neural Ranking

Afshin Rahimi, Timothy Baldwin, Karin Verspoor


Abstract
We present our work on aligning the Unified Medical Language System (UMLS) to Wikipedia, to facilitate manual alignment of the two resources. We propose a cross-lingual neural reranking model to match a UMLS concept with a Wikipedia page, which achieves a recall@1of 72%, a substantial improvement of 20% over word- and char-level BM25, enabling manual alignment with minimal effort. We release our resources, including ranked Wikipedia pages for 700k UMLSconcepts, and WikiUMLS, a dataset for training and evaluation of alignment models between UMLS and Wikipedia collected from Wikidata. This will provide easier access to Wikipedia for health professionals, patients, and NLP systems, including in multilingual settings.
Anthology ID:
2020.coling-main.523
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
5957–5962
Language:
URL:
https://aclanthology.org/2020.coling-main.523
DOI:
10.18653/v1/2020.coling-main.523
Bibkey:
Cite (ACL):
Afshin Rahimi, Timothy Baldwin, and Karin Verspoor. 2020. WikiUMLS: Aligning UMLS to Wikipedia via Cross-lingual Neural Ranking. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5957–5962, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
WikiUMLS: Aligning UMLS to Wikipedia via Cross-lingual Neural Ranking (Rahimi et al., COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.523.pdf
Code
 afshinrahimi/wikiumls