A Privacy-preserving Approach to Ingest Knowledge from Proprietary Web-based to Locally Run Models for Medical Progress Note Generation

Sarvesh Soni, Dina Demner-Fushman


Abstract
Clinical documentation is correlated with increasing clinician burden, leading to the rise of automated methods to generate medical notes. Due to the sensitive nature of patient electronic health records (EHRs), locally run models are preferred for a variety of reasons including privacy, bias, and cost. However, most open-source locally run models (including medical-specific) are much smaller with limited input context size compared to the more powerful closed-source large language models (LLMs) generally available through web APIs (Application Programming Interfaces). In this paper, we propose a framework to harness superior reasoning capabilities and medical knowledge from closed-source online LLMs in a privacy-preserving manner and seamlessly incorporate it into locally run models. Specifically, we leverage a web-based model to distill the vast patient information available in EHRs into a clinically relevant subset without sending sensitive patient health information online and use this distilled knowledge to generate progress notes by a locally run model. Our ablation results indicate that the proposed framework improves the performance of the Mixtral model on progress note generation by 4.6 points on ROUGE (a text-matching based metric) and 7.56 points on MEDCON F1 (a metric that measures the clinical concepts overlap).
Anthology ID:
2024.privatenlp-1.18
Volume:
Proceedings of the Fifth Workshop on Privacy in Natural Language Processing
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Ivan Habernal, Sepideh Ghanavati, Abhilasha Ravichander, Vijayanta Jain, Patricia Thaine, Timour Igamberdiev, Niloofar Mireshghallah, Oluwaseyi Feyisetan
Venues:
PrivateNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
178–183
Language:
URL:
https://aclanthology.org/2024.privatenlp-1.18
DOI:
Bibkey:
Cite (ACL):
Sarvesh Soni and Dina Demner-Fushman. 2024. A Privacy-preserving Approach to Ingest Knowledge from Proprietary Web-based to Locally Run Models for Medical Progress Note Generation. In Proceedings of the Fifth Workshop on Privacy in Natural Language Processing, pages 178–183, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
A Privacy-preserving Approach to Ingest Knowledge from Proprietary Web-based to Locally Run Models for Medical Progress Note Generation (Soni & Demner-Fushman, PrivateNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.privatenlp-1.18.pdf