A Deterministic Multi-Stage Retrieval Pipeline for Longitudinal EHR Question Answering

Shubham Agarwal; Thomas Searle; Richard Dobson; Ninoslav Majkic; Niko Moller-Grell

A Deterministic Multi-Stage Retrieval Pipeline for Longitudinal EHR Question Answering

Shubham Agarwal, Thomas Searle, Richard Dobson, Ninoslav Majkic, Niko Moller-Grell

Abstract

Retrieval-augmented generation (RAG) holds promise for clinical question answering over electronic health records (EHRs), but existing systems treat retrieval as an opaque subroutine, limiting auditability and reliability in patient care workflows. We introduce a deterministic multi-stage retrieval pipeline for longitudinal EHR question answering that decomposes retrieval into four distinct, ablated stages where each stage is instrumented with diagnostic metrics, making the flow of clinical evidence measurable and auditable at every step. Evaluated on a broad LLM-annotated cohort and an expert-annotated cardiovascular benchmark developed alongside clinicians from real ICU records, the full pipeline achieves 22-23% relative recall gain over a strong dense retrieval baseline across both cohorts, with consistent improvements in downstream answer quality. The pipeline’s deterministic and transparent design addresses a critical gap in clinical NLP: retrieval systems that clinicians and researchers can not only rely on, but inspect, audit, and build upon for real-world deployment.

Anthology ID:: 2026.bionlp-1.53
Volume:: BioNLP 2026
Month:: July
Year:: 2026
Address:: San Diego, California
Editors:: Dina Demner-Fushman, Sophia Ananiadou, Kirk Roberts, Junichi Tsujii
Venues:: BioNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 665–678
Language:
URL:: https://aclanthology.org/2026.bionlp-1.53/
DOI:
Bibkey:
Cite (ACL):: Shubham Agarwal, Thomas Searle, Richard Dobson, Ninoslav Majkic, and Niko Moller-Grell. 2026. A Deterministic Multi-Stage Retrieval Pipeline for Longitudinal EHR Question Answering. In BioNLP 2026, pages 665–678, San Diego, California. Association for Computational Linguistics.
Cite (Informal):: A Deterministic Multi-Stage Retrieval Pipeline for Longitudinal EHR Question Answering (Agarwal et al., BioNLP 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.bionlp-1.53.pdf

PDF Cite Search Fix data