Med-HALT: Medical Domain Hallucination Test for Large Language Models

Ankit Pal, Logesh Kumar Umapathi, Malaikannan Sankarasubbu


Abstract
This research paper focuses on the challenges posed by hallucinations in large language models (LLMs), particularly in the context of the medical domain. Hallucination, wherein these models generate plausible yet unverified or incorrect information, can have serious consequences in healthcare applications. We propose a new benchmark and dataset, Med-HALT (Medical Domain Hallucination Test), designed specifically to evaluate and reduce hallucinations. Med-HALT provides a diverse multinational dataset derived from medical examinations across various countries and includes multiple innovative testing modalities. Med-HALT includes two categories of tests reasoning and memory-based hallucination tests, designed to assess LLMs’ problem-solving and information retrieval abilities. Our study evaluated leading LLMs, including Text Davinci, GPT-3.5, LlaMa-2, MPT, and Falcon, revealing significant differences in their performance. The paper provides detailed insights into the dataset, promoting transparency and reproducibility. Through this work, we aim to contribute to the development of safer and more reliable language models in healthcare. Our benchmark can be found at medhalt.github.io
Anthology ID:
2023.conll-1.21
Volume:
Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL)
Month:
December
Year:
2023
Address:
Singapore
Editors:
Jing Jiang, David Reitter, Shumin Deng
Venue:
CoNLL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
314–334
Language:
URL:
https://aclanthology.org/2023.conll-1.21
DOI:
10.18653/v1/2023.conll-1.21
Bibkey:
Cite (ACL):
Ankit Pal, Logesh Kumar Umapathi, and Malaikannan Sankarasubbu. 2023. Med-HALT: Medical Domain Hallucination Test for Large Language Models. In Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL), pages 314–334, Singapore. Association for Computational Linguistics.
Cite (Informal):
Med-HALT: Medical Domain Hallucination Test for Large Language Models (Pal et al., CoNLL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.conll-1.21.pdf