Effective Feature Representation for Clinical Text Concept Extraction

Yifeng Tao, Bruno Godefroy, Guillaume Genthial, Christopher Potts


Abstract
Crucial information about the practice of healthcare is recorded only in free-form text, which creates an enormous opportunity for high-impact NLP. However, annotated healthcare datasets tend to be small and expensive to obtain, which raises the question of how to make maximally efficient uses of the available data. To this end, we develop an LSTM-CRF model for combining unsupervised word representations and hand-built feature representations derived from publicly available healthcare ontologies. We show that this combined model yields superior performance on five datasets of diverse kinds of healthcare text (clinical, social, scientific, commercial). Each involves the labeling of complex, multi-word spans that pick out different healthcare concepts. We also introduce a new labeled dataset for identifying the treatment relations between drugs and diseases.
Anthology ID:
W19-1901
Volume:
Proceedings of the 2nd Clinical Natural Language Processing Workshop
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota, USA
Editors:
Anna Rumshisky, Kirk Roberts, Steven Bethard, Tristan Naumann
Venue:
ClinicalNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–14
Language:
URL:
https://aclanthology.org/W19-1901
DOI:
10.18653/v1/W19-1901
Bibkey:
Cite (ACL):
Yifeng Tao, Bruno Godefroy, Guillaume Genthial, and Christopher Potts. 2019. Effective Feature Representation for Clinical Text Concept Extraction. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, pages 1–14, Minneapolis, Minnesota, USA. Association for Computational Linguistics.
Cite (Informal):
Effective Feature Representation for Clinical Text Concept Extraction (Tao et al., ClinicalNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-1901.pdf
Software:
 W19-1901.Software.txt