Enhancing Automatic ICD-9-CM Code Assignment for Medical Texts with PubMed

Danchen Zhang, Daqing He, Sanqiang Zhao, Lei Li


Abstract
Assigning a standard ICD-9-CM code to disease symptoms in medical texts is an important task in the medical domain. Automating this process could greatly reduce the costs. However, the effectiveness of an automatic ICD-9-CM code classifier faces a serious problem, which can be triggered by unbalanced training data. Frequent diseases often have more training data, which helps its classification to perform better than that of an infrequent disease. However, a disease’s frequency does not necessarily reflect its importance. To resolve this training data shortage problem, we propose to strategically draw data from PubMed to enrich the training data when there is such need. We validate our method on the CMC dataset, and the evaluation results indicate that our method can significantly improve the code assignment classifiers’ performance at the macro-averaging level.
Anthology ID:
W17-2333
Volume:
BioNLP 2017
Month:
August
Year:
2017
Address:
Vancouver, Canada,
Editors:
Kevin Bretonnel Cohen, Dina Demner-Fushman, Sophia Ananiadou, Junichi Tsujii
Venue:
BioNLP
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
263–271
Language:
URL:
https://aclanthology.org/W17-2333
DOI:
10.18653/v1/W17-2333
Bibkey:
Cite (ACL):
Danchen Zhang, Daqing He, Sanqiang Zhao, and Lei Li. 2017. Enhancing Automatic ICD-9-CM Code Assignment for Medical Texts with PubMed. In BioNLP 2017, pages 263–271, Vancouver, Canada,. Association for Computational Linguistics.
Cite (Informal):
Enhancing Automatic ICD-9-CM Code Assignment for Medical Texts with PubMed (Zhang et al., BioNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-2333.pdf