Haraldur Hauksson


2024

pdf bib
Applications of BERT Models Towards Automation of Clinical Coding in Icelandic
Haraldur Hauksson | Hafsteinn Einarsson
Findings of the Association for Computational Linguistics: NAACL 2024

This study explores the potential of automating clinical coding in Icelandic, a language with limited digital resources, by leveraging over 25 years of electronic health records (EHR) from the Landspitali University Hospital. Traditionally a manual and error-prone task, clinical coding is essential for patient care, billing, and research. Our research delves into the effectiveness of Transformer-based models in automating this process. We investigate various model training strategies, including continued pretraining and model adaptation, under a constrained computational budget. Our findings reveal that the best-performing model achieves competitive results in both micro and macro F1 scores, with label attention contributing significantly to its success. The study also explores the possibility of training on unlabeled data. Our research provides valuable insights into the possibilities of using NLP for clinical coding in low-resource languages, demonstrating that small countries with unique languages and well-segmented healthcare records can achieve results comparable to those in higher-resourced languages.