Rare Codes Count: Mining Inter-code Relations for Long-tail Clinical Text Classification

Jiamin Chen, Xuhong Li, Junting Xi, Lei Yu, Haoyi Xiong


Abstract
Multi-label clinical text classification, such as automatic ICD coding, has always been a challenging subject in Natural Language Processing, due to its long, domain-specific documents and long-tail distribution over a large label set. Existing methods adopt different model architectures to encode the clinical notes. Whereas without digging out the useful connections between labels, the model presents a huge gap in predicting performances between rare and frequent codes. In this work, we propose a novel method for further mining the helpful relations between different codes via a relation-enhanced code encoder to improve the rare code performance. Starting from the simple code descriptions, the model reaches comparable, even better performances than models with heavy external knowledge. Our proposed method is evaluated on MIMIC-III, a common dataset in the medical domain. It outperforms the previous state-of-art models on both overall metrics and rare code performances. Moreover, the interpretation results further prove the effectiveness of our methods. Our code is publicly available at https://github.com/jiaminchen-1031/Rare-ICD.
Anthology ID:
2023.clinicalnlp-1.43
Volume:
Proceedings of the 5th Clinical Natural Language Processing Workshop
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Tristan Naumann, Asma Ben Abacha, Steven Bethard, Kirk Roberts, Anna Rumshisky
Venue:
ClinicalNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
403–413
Language:
URL:
https://aclanthology.org/2023.clinicalnlp-1.43
DOI:
10.18653/v1/2023.clinicalnlp-1.43
Bibkey:
Cite (ACL):
Jiamin Chen, Xuhong Li, Junting Xi, Lei Yu, and Haoyi Xiong. 2023. Rare Codes Count: Mining Inter-code Relations for Long-tail Clinical Text Classification. In Proceedings of the 5th Clinical Natural Language Processing Workshop, pages 403–413, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Rare Codes Count: Mining Inter-code Relations for Long-tail Clinical Text Classification (Chen et al., ClinicalNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.clinicalnlp-1.43.pdf