A Comprehensive Survey on the Trustworthiness of Large Language Models in Healthcare

Manar Aljohani; Jun Hou; Sindhura Kommu; Xuan Wang

doi:10.18653/v1/2025.findings-emnlp.356

A Comprehensive Survey on the Trustworthiness of Large Language Models in Healthcare

Manar Aljohani, Jun Hou, Sindhura Kommu, Xuan Wang

Abstract

The application of large language models (LLMs) in healthcare holds significant promise for enhancing clinical decision-making, medical research, and patient care. However, their integration into real-world clinical settings raises critical concerns around trustworthiness, particularly around dimensions of truthfulness, privacy, safety, robustness, fairness, and explainability. These dimensions are essential for ensuring that LLMs generate reliable, unbiased, and ethically sound outputs. While researchers have recently begun developing benchmarks and evaluation frameworks to assess LLM trustworthiness, the trustworthiness of LLMs in healthcare remains underexplored, lacking a systematic review that provides a comprehensive understanding and future insights. This survey addresses that gap by providing a comprehensive review of current methodologies and solutions aimed at mitigating risks across key trust dimensions. We analyze how each dimension affects the reliability and ethical deployment of healthcare LLMs, synthesize ongoing research efforts and identify critical gaps in existing approaches. We also identify emerging challenges posed by evolving paradigms, such as multi-agent collaboration, multi-modal reasoning, and the development of small open-source medical models. Our goal is to guide future research toward more trustworthy, transparent, and clinically viable LLMs.

Anthology ID:: 2025.findings-emnlp.356
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6720–6748
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.356/
DOI:: 10.18653/v1/2025.findings-emnlp.356
Bibkey:
Cite (ACL):: Manar Aljohani, Jun Hou, Sindhura Kommu, and Xuan Wang. 2025. A Comprehensive Survey on the Trustworthiness of Large Language Models in Healthcare. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 6720–6748, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: A Comprehensive Survey on the Trustworthiness of Large Language Models in Healthcare (Aljohani et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.356.pdf
Checklist:: 2025.findings-emnlp.356.checklist.pdf

PDF Cite Search Checklist Fix data