Improving Few-shot Prompting using Cluster-based Sample Retrieval for Medical NER in Clinical Text

Meethu Mohan C, Sneha Shaji Punnan, Jeena Kleenankandy


Abstract
Named Entity Recognition (NER) in the medical domain is crucial for extracting essential information from clinical text. Large Language Models (LLMs) have demonstrated remarkable capabilities in this task, but their performance is highly dependent on the quality of the prompts. Few-shot prompting or prompt-by-example, where the input query to LLM is augmented with one or more sample outputs, is a well-known technique in guiding the LLMs to the expected result. The quality of the sample in the prompt plays an important role in this task. This paper proposes to improve the performance of few-shot prompting for medical NER on clinical text using a cluster-based strategy for sample selection. We employ the concepts from Retrieval Augmented Generation (RAG) and K-means clustering to identify the most similar annotated examples for any given input text. Using these contextually relevant yet divergent training samples as examples, we guide the LLM toward extracting more accurate medical entities. Our experiments using the llama-2 model show that this approach significantly outperforms zero-shot prompting and random sampled few-shot prompting in two data sets chosen for this study, demonstrating the efficacy of cluster-based retrieval in improving few-shot prompting for medical NER tasks.
Anthology ID:
2024.icon-1.4
Volume:
Proceedings of the 21st International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2024
Address:
AU-KBC Research Centre, Chennai, India
Editors:
Sobha Lalitha Devi, Karunesh Arora
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
37–44
Language:
URL:
https://aclanthology.org/2024.icon-1.4/
DOI:
Bibkey:
Cite (ACL):
Meethu Mohan C, Sneha Shaji Punnan, and Jeena Kleenankandy. 2024. Improving Few-shot Prompting using Cluster-based Sample Retrieval for Medical NER in Clinical Text. In Proceedings of the 21st International Conference on Natural Language Processing (ICON), pages 37–44, AU-KBC Research Centre, Chennai, India. NLP Association of India (NLPAI).
Cite (Informal):
Improving Few-shot Prompting using Cluster-based Sample Retrieval for Medical NER in Clinical Text (C et al., ICON 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.icon-1.4.pdf