A Framework for Developing and Evaluating Word Embeddings of Drug-named Entity

Mengnan Zhao, Aaron J. Masino, Christopher C. Yang


Abstract
We investigate the quality of task specific word embeddings created with relatively small, targeted corpora. We present a comprehensive evaluation framework including both intrinsic and extrinsic evaluation that can be expanded to named entities beyond drug name. Intrinsic evaluation results tell that drug name embeddings created with a domain specific document corpus outperformed the previously published versions that derived from a very large general text corpus. Extrinsic evaluation uses word embedding for the task of drug name recognition with Bi-LSTM model and the results demonstrate the advantage of using domain-specific word embeddings as the only input feature for drug name recognition with F1-score achieving 0.91. This work suggests that it may be advantageous to derive domain specific embeddings for certain tasks even when the domain specific corpus is of limited size.
Anthology ID:
W18-2319
Volume:
Proceedings of the BioNLP 2018 workshop
Month:
July
Year:
2018
Address:
Melbourne, Australia
Venues:
ACL | BioNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
156–160
Language:
URL:
https://aclanthology.org/W18-2319
DOI:
10.18653/v1/W18-2319
Bibkey:
Cite (ACL):
Mengnan Zhao, Aaron J. Masino, and Christopher C. Yang. 2018. A Framework for Developing and Evaluating Word Embeddings of Drug-named Entity. In Proceedings of the BioNLP 2018 workshop, pages 156–160, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
A Framework for Developing and Evaluating Word Embeddings of Drug-named Entity (Zhao et al., 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-2319.pdf