Improving Automatic KCD Coding: Introducing the KoDAK and an Optimized Tokenization Method for Korean Clinical Documents

Geunyeong Jeong, Juoh Sun, Seokwon Jeong, Hyunjin Shin, Harksoo Kim


Abstract
International Classification of Diseases (ICD) coding is the task of assigning a patient’s electronic health records into standardized codes, which is crucial for enhancing medical services and reducing healthcare costs. In Korea, automatic Korean Standard Classification of Diseases (KCD) coding has been hindered by limited resources, differences in ICD systems, and language-specific characteristics. Therefore, we construct the Korean Dataset for Automatic KCD coding (KoDAK) by collecting and preprocessing Korean clinical documents. In addition, we propose a tokenization method optimized for Korean clinical documents. Our experiments show that our proposed method outperforms Korean Medical BERT (KM-BERT) in Macro-F1 performance by 0.14%p while using fewer model parameters, demonstrating its effectiveness in Korean clinical documents.
Anthology ID:
2023.clinicalnlp-1.12
Volume:
Proceedings of the 5th Clinical Natural Language Processing Workshop
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Tristan Naumann, Asma Ben Abacha, Steven Bethard, Kirk Roberts, Anna Rumshisky
Venue:
ClinicalNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
96–101
Language:
URL:
https://aclanthology.org/2023.clinicalnlp-1.12
DOI:
10.18653/v1/2023.clinicalnlp-1.12
Bibkey:
Cite (ACL):
Geunyeong Jeong, Juoh Sun, Seokwon Jeong, Hyunjin Shin, and Harksoo Kim. 2023. Improving Automatic KCD Coding: Introducing the KoDAK and an Optimized Tokenization Method for Korean Clinical Documents. In Proceedings of the 5th Clinical Natural Language Processing Workshop, pages 96–101, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Improving Automatic KCD Coding: Introducing the KoDAK and an Optimized Tokenization Method for Korean Clinical Documents (Jeong et al., ClinicalNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.clinicalnlp-1.12.pdf