Triplet-Trained Vector Space and Sieve-Based Search Improve Biomedical Concept Normalization

Dongfang Xu, Steven Bethard


Abstract
Concept normalization, the task of linking textual mentions of concepts to concepts in an ontology, is critical for mining and analyzing biomedical texts. We propose a vector-space model for concept normalization, where mentions and concepts are encoded via transformer networks that are trained via a triplet objective with online hard triplet mining. The transformer networks refine existing pre-trained models, and the online triplet mining makes training efficient even with hundreds of thousands of concepts by sampling training triples within each mini-batch. We introduce a variety of strategies for searching with the trained vector-space model, including approaches that incorporate domain-specific synonyms at search time with no model retraining. Across five datasets, our models that are trained only once on their corresponding ontologies are within 3 points of state-of-the-art models that are retrained for each new domain. Our models can also be trained for each domain, achieving new state-of-the-art on multiple datasets.
Anthology ID:
2021.bionlp-1.2
Original:
2021.bionlp-1.2v1
Version 2:
2021.bionlp-1.2v2
Volume:
Proceedings of the 20th Workshop on Biomedical Language Processing
Month:
June
Year:
2021
Address:
Online
Venues:
BioNLP | NAACL
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
11–22
Language:
URL:
https://aclanthology.org/2021.bionlp-1.2
DOI:
10.18653/v1/2021.bionlp-1.2
Bibkey:
Cite (ACL):
Dongfang Xu and Steven Bethard. 2021. Triplet-Trained Vector Space and Sieve-Based Search Improve Biomedical Concept Normalization. In Proceedings of the 20th Workshop on Biomedical Language Processing, pages 11–22, Online. Association for Computational Linguistics.
Cite (Informal):
Triplet-Trained Vector Space and Sieve-Based Search Improve Biomedical Concept Normalization (Xu & Bethard, BioNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.bionlp-1.2.pdf
Code
 dongfang91/triplet-search-connorm