Unsupervised Abbreviation Detection in Clinical Narratives

Markus Kreuzthaler, Michel Oleynik, Alexander Avian, Stefan Schulz


Abstract
Clinical narratives in electronic health record systems are a rich resource of patient-based information. They constitute an ongoing challenge for natural language processing, due to their high compactness and abundance of short forms. German medical texts exhibit numerous ad-hoc abbreviations that terminate with a period character. The disambiguation of period characters is therefore an important task for sentence and abbreviation detection. This task is addressed by a combination of co-occurrence information of word types with trailing period characters, a large domain dictionary, and a simple rule engine, thus merging statistical and dictionary-based disambiguation strategies. An F-measure of 0.95 could be reached by using the unsupervised approach presented in this paper. The results are promising for a domain-independent abbreviation detection strategy, because our approach avoids retraining of models or use case specific feature engineering efforts required for supervised machine learning approaches.
Anthology ID:
W16-4213
Volume:
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Anna Rumshisky, Kirk Roberts, Steven Bethard, Tristan Naumann
Venue:
ClinicalNLP
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
91–98
Language:
URL:
https://aclanthology.org/W16-4213
DOI:
Bibkey:
Cite (ACL):
Markus Kreuzthaler, Michel Oleynik, Alexander Avian, and Stefan Schulz. 2016. Unsupervised Abbreviation Detection in Clinical Narratives. In Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP), pages 91–98, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Unsupervised Abbreviation Detection in Clinical Narratives (Kreuzthaler et al., ClinicalNLP 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-4213.pdf