Instruction-Tuning LLaMA for Synthetic Medical Note Generation in Swedish and English

Lotta Kiefer, Jesujoba Alabi, Thomas Vakili, Hercules Dalianis, Dietrich Klakow


Abstract
The increasing capabilities of large language models (LLMs) have unlocked transformative potential for medical applications, but privacy constraints limit access to high-quality training data from electronic health records (EHRs). In response, we propose a framework to generate synthetic EHRs by instruction-tuning an LLM using descriptions of diagnosis codes. We show that this framework overcomes problems of prior approaches, such as diversity reduction and medical incoherence, while maintaining strong privacy protections. Utility was measured by training models to predict diagnosis codes for EHRs. Real data still has higher utility, but synthetic data approaches real data results with increasing dataset size. The differences in utility were most likely due to noise in the synthetic data. A user study involving medical professionals confirmed no significant loss in readability or medical coherence compared to the real HRs, even though inter-annotator agreement is low. These findings establish synthetic EHRs as a viable alternative for privacypreserving and scalable clinical NLP applications. We release our code on GitHub.
Anthology ID:
2025.ranlp-1.65
Volume:
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
557–566
Language:
URL:
https://aclanthology.org/2025.ranlp-1.65/
DOI:
Bibkey:
Cite (ACL):
Lotta Kiefer, Jesujoba Alabi, Thomas Vakili, Hercules Dalianis, and Dietrich Klakow. 2025. Instruction-Tuning LLaMA for Synthetic Medical Note Generation in Swedish and English. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 557–566, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Instruction-Tuning LLaMA for Synthetic Medical Note Generation in Swedish and English (Kiefer et al., RANLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ranlp-1.65.pdf