CliniBench: A Clinical Outcome Prediction Benchmark for Generative and Encoder-Based Language Models

Paul Grundmann; Jan Frick; Dennis Fast; Thomas Steffek; Felix Gers; Wolfgang Nejdl; Alexander Löser

CliniBench: A Clinical Outcome Prediction Benchmark for Generative and Encoder-Based Language Models

Paul Grundmann, Jan Frick, Dennis Fast, Thomas Steffek, Felix Gers, Wolfgang Nejdl, Alexander Löser

Abstract

With their growing capabilities, generative large language models (LLMs) are being increasingly investigated for complex medical tasks.However, their effectiveness in real-world clinical applications remains underexplored. To address this, we present CliniBench, the first benchmark that enables comparability of well-studied encoder-based classifiers and generative LLMs for discharge diagnosis prediction from admission notes in the MIMIC-IV dataset. Our extensive study compares 12 generative LLMs and 3 encoder-based classifiers and demonstrates that encoder-based classifiers consistently outperform generative models in diagnosis prediction. We assess several retrieval augmentation strategies for in-context learning from similar patients and find that they provide notable performance improvements for generative LLMs.

Anthology ID:: 2026.eacl-long.247
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5360–5378
Language:
URL:: https://aclanthology.org/2026.eacl-long.247/
DOI:
Bibkey:
Cite (ACL):: Paul Grundmann, Jan Frick, Dennis Fast, Thomas Steffek, Felix Gers, Wolfgang Nejdl, and Alexander Löser. 2026. CliniBench: A Clinical Outcome Prediction Benchmark for Generative and Encoder-Based Language Models. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5360–5378, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: CliniBench: A Clinical Outcome Prediction Benchmark for Generative and Encoder-Based Language Models (Grundmann et al., EACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.eacl-long.247.pdf
Checklist:: 2026.eacl-long.247.checklist.pdf

PDF Cite Search Checklist Fix data