Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation

Hanyin Wang; Chufan Gao; Bolun Liu; Qiping Xu; Guleid Hussein; Mohamad El Labban; Kingsley Iheasirim; Hariprasad Reddy Korsapati; Chuck Outcalt; Jimeng Sun

doi:10.18653/v1/2025.findings-acl.626

Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation

Hanyin Wang, Chufan Gao, Bolun Liu, Qiping Xu, Guleid Hussein, Mohamad El Labban, Kingsley Iheasirim, Hariprasad Reddy Korsapati, Chuck Outcalt, Jimeng Sun

Abstract

Proprietary Large Language Models (LLMs) such as GPT-4 and Gemini have demonstrated promising capabilities in clinical text summarization tasks. However, due to patient data privacy concerns and computational costs, many healthcare providers prefer using small, locally-hosted models over external generic LLMs. This study presents a comprehensive domain- and task-specific adaptation process for the open-source LLaMA-2 13 billion parameter model, enabling it to generate high-quality clinical notes from outpatient patient-doctor dialogues. Our process incorporates continued pre-training, supervised fine-tuning, and reinforcement learning from both AI and human feedback. We introduced a new approach, DistillDirect, for performing on-policy reinforcement learning with Gemini 1.0 Pro as the teacher model. Our resulting model, LLaMA-Clinic, can generate clinical notes comparable in quality to those authored by physicians. In a blinded physician reader study, the majority (92.8%) of individual evaluations rated the notes generated by LLaMA-Clinic as “acceptable” or higher across all three criteria: real-world readiness, completeness, and accuracy. In the more challenging “Assessment and Plan” section, LLaMA-Clinic received the same score as the notes authored by physicians. We highlight key considerations for future clinical note-generation tasks, emphasizing the importance of pre-defining a best-practice note format, rather than relying on LLMs to determine this for clinical practice.

Anthology ID:: 2025.findings-acl.626
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12084–12117
Language:
URL:: https://aclanthology.org/2025.findings-acl.626/
DOI:: 10.18653/v1/2025.findings-acl.626
Bibkey:
Cite (ACL):: Hanyin Wang, Chufan Gao, Bolun Liu, Qiping Xu, Guleid Hussein, Mohamad El Labban, Kingsley Iheasirim, Hariprasad Reddy Korsapati, Chuck Outcalt, and Jimeng Sun. 2025. Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 12084–12117, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation (Wang et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.626.pdf

PDF Cite Search Fix data