Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models

Ran Xu, Hejie Cui, Yue Yu, Xuan Kan, Wenqi Shi, Yuchen Zhuang, May Dongmei Wang, Wei Jin, Joyce Ho, Carl Yang


Abstract
Clinical natural language processing faces challenges like complex medical terminology and clinical contexts. Recently, large language models (LLMs) have shown promise in this domain. Yet, their direct deployment can lead to privacy issues and are constrained by resources. To address this challenge, we delve into synthetic clinical text generation with LLMs for clinical NLP tasks. We propose an innovative, resource-efficient approach, ClinGen, which infuses knowledge into the process. Our model involves clinical knowledge extraction and context-informed LLM prompting. Both clinical topics and writing styles are drawn from external domain-specific knowledge graphs and LLMs to guide data generation. Our extensive empirical study across 8 clinical NLP tasks and 18 datasets reveals that ClinGen consistently enhances performance across various tasks by 7.7%-8.7% on average, effectively aligning the distribution of real datasets and enriching the diversity of generated training instances.
Anthology ID:
2024.findings-acl.916
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15496–15523
Language:
URL:
https://aclanthology.org/2024.findings-acl.916
DOI:
Bibkey:
Cite (ACL):
Ran Xu, Hejie Cui, Yue Yu, Xuan Kan, Wenqi Shi, Yuchen Zhuang, May Dongmei Wang, Wei Jin, Joyce Ho, and Carl Yang. 2024. Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models. In Findings of the Association for Computational Linguistics ACL 2024, pages 15496–15523, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models (Xu et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.916.pdf