MedRiskEval: Medical Risk Evaluation Benchmark of Language Models, On the Importance of User Perspectives in Healthcare Settings

Jean-Philippe Corbeil; Minseon Kim; Maxime Griot; Sheela Agarwal; Alessandro Sordoni; Francois Beaulieu; Paul Vozila

MedRiskEval: Medical Risk Evaluation Benchmark of Language Models, On the Importance of User Perspectives in Healthcare Settings

Jean-Philippe Corbeil, Minseon Kim, Maxime Griot, Sheela Agarwal, Alessandro Sordoni, Francois Beaulieu, Paul Vozila

Abstract

As the performance of large language models (LLMs) continues to advance, their adoption in the medical domain is increasing. However, most existing risk evaluations largely focused on general safety benchmarks. In the medical applications, LLMs may be used by a wide range of users, ranging from general users and patients to clinicians, with diverse levels of expertise and the model’s outputs can have a direct impact on human health which raises serious safety concerns. In this paper, we introduce MedRiskEval, a medical risk evaluation benchmark tailored to the medical domain. To fill the gap in previous benchmarks that only focused on the clinician perspective, we introduce a new patient-oriented dataset called PatientSafetyBench containing 466 samples across 5 critical risk categories. Leveraging our new benchmark alongside existing datasets, we evaluate a variety of open- and closed-source LLMs. To the best of our knowledge, this work establishes an initial foundation for safer deployment of LLMs in healthcare.

Anthology ID:: 2026.eacl-industry.39
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Yevgen Matusevych, Gülşen Eryiğit, Nikolaos Aletras
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 513–524
Language:
URL:: https://aclanthology.org/2026.eacl-industry.39/
DOI:
Bibkey:
Cite (ACL):: Jean-Philippe Corbeil, Minseon Kim, Maxime Griot, Sheela Agarwal, Alessandro Sordoni, Francois Beaulieu, and Paul Vozila. 2026. MedRiskEval: Medical Risk Evaluation Benchmark of Language Models, On the Importance of User Perspectives in Healthcare Settings. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track), pages 513–524, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: MedRiskEval: Medical Risk Evaluation Benchmark of Language Models, On the Importance of User Perspectives in Healthcare Settings (Corbeil et al., EACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.eacl-industry.39.pdf

PDF Cite Search Fix data