RuCCoD: Towards Automated ICD Coding in Russian

Alexandr Nesterov; Andrey Sakhovskiy; Ivan Sviridov; Airat Valiev; Vladimir Makharev; Petr Anokhin; Galina Zubkova; Elena Tutubalina

doi:10.18653/v1/2025.emnlp-main.129

RuCCoD: Towards Automated ICD Coding in Russian

Alexandr Nesterov, Andrey Sakhovskiy, Ivan Sviridov, Airat Valiev, Vladimir Makharev, Petr Anokhin, Galina Zubkova, Elena Tutubalina

Abstract

This study investigates the feasibility of automating clinical coding in Russian, a language with limited biomedical resources. We present a new dataset for ICD coding, which includes diagnosis fields from electronic health records (EHRs) annotated with over 10,000 entities and more than 1,500 unique ICD codes. This dataset serves as a benchmark for several state-of-the-art models, including BERT, LLaMA with LoRA, and RAG, with additional experiments examining transfer learning across domains (from PubMed abstracts to medical diagnosis) and terminologies (from UMLS concepts to ICD codes). We then apply the best-performing model to label an in-house EHR dataset containing patient histories from 2017 to 2021. Our experiments, conducted on a carefully curated test set, demonstrate that training with the automated predicted codes leads to a significant improvement in accuracy compared to manually annotated data from physicians. We believe our findings offer valuable insights into the potential for automating clinical coding in resource-limited languages like Russian, which could enhance clinical efficiency and data accuracy in these contexts. Our code and dataset are available at https://github.com/auto-icd-coding/ruccod.

Anthology ID:: 2025.emnlp-main.129
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2558–2585
Language:
URL:: https://aclanthology.org/2025.emnlp-main.129/
DOI:: 10.18653/v1/2025.emnlp-main.129
Bibkey:
Cite (ACL):: Alexandr Nesterov, Andrey Sakhovskiy, Ivan Sviridov, Airat Valiev, Vladimir Makharev, Petr Anokhin, Galina Zubkova, and Elena Tutubalina. 2025. RuCCoD: Towards Automated ICD Coding in Russian. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 2558–2585, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: RuCCoD: Towards Automated ICD Coding in Russian (Nesterov et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.129.pdf
Checklist:: 2025.emnlp-main.129.checklist.pdf

PDF Cite Search Checklist Fix data