Zero-Shot ATC Coding with Large Language Models for Clinical Assessments

Zijian Chen; John-Michael Gamble; Micaela Jantzi; John P. Hirdes; Jimmy Lin

doi:10.18653/v1/2025.naacl-industry.19

Zero-Shot ATC Coding with Large Language Models for Clinical Assessments

Zijian Chen, John-Michael Gamble, Micaela Jantzi, John P. Hirdes, Jimmy Lin

Abstract

Manual assignment of Anatomical Therapeutic Chemical (ATC) codes to prescription records is a significant bottleneck in healthcare research and operations at Ontario Health and InterRAI Canada, requiring extensive expert time and effort. To automate this process while maintaining data privacy, we develop a practical approach using locally deployable large language models (LLMs). Inspired by recent advances in automatic International Classification of Diseases (ICD) coding, our method frames ATC coding as a hierarchical information extraction task, guiding LLMs through the ATC ontology level by level. We evaluate our approach using GPT-4o as an accuracy ceiling and focus development on open-source Llama models suitable for privacy-sensitive deployment. Testing across Health Canada drug product data, the RABBITS benchmark, and real clinical notes from Ontario Health, our method achieves 78% exact match accuracy with GPT-4o and 60% with Llama 3.1 70B. We investigate knowledge grounding through drug definitions, finding modest improvements in accuracy. Further, we show that fine-tuned Llama 3.1 8B matches zero-shot Llama 3.1 70B accuracy, suggesting that effective ATC coding is feasible with smaller models. Our results demonstrate the feasibility of automatic ATC coding in privacy-sensitive healthcare environments, providing a foundation for future deployments.

Anthology ID:: 2025.naacl-industry.19
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Weizhu Chen, Yi Yang, Mohammad Kachuee, Xue-Yong Fu
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 226–232
Language:
URL:: https://aclanthology.org/2025.naacl-industry.19/
DOI:: 10.18653/v1/2025.naacl-industry.19
Bibkey:
Cite (ACL):: Zijian Chen, John-Michael Gamble, Micaela Jantzi, John P. Hirdes, and Jimmy Lin. 2025. Zero-Shot ATC Coding with Large Language Models for Clinical Assessments. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track), pages 226–232, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Zero-Shot ATC Coding with Large Language Models for Clinical Assessments (Chen et al., NAACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.naacl-industry.19.pdf

PDF Cite Search Fix data