Do Clinical Question Answering Systems Really Need Specialised Medical Fine Tuning?

Sushant Kumar Ray; Gautam Siddharth Kashyap; Sahil Tripathi; Nipun Joshi; Vijay Govindarajan; Rafiq Ali; Jiechao Gao; Usman Naseem

Do Clinical Question Answering Systems Really Need Specialised Medical Fine Tuning?

Sushant Kumar Ray, Gautam Siddharth Kashyap, Sahil Tripathi, Nipun Joshi, Vijay Govindarajan, Rafiq Ali, Jiechao Gao, Usman Naseem

Abstract

Clinical Question-Answering (CQA) industry systems are increasingly rely on Large Language Models (LLMs), yet their deployment is often guided by the assumption that domain-specific fine-tuning is essential. Although specialised medical LLMs such as BioBERT, BioGPT, and PubMedBERT remain popular, they face practical limitations including narrow coverage, high retraining costs, and limited adaptability. Efforts based on Supervised Fine-Tuning (SFT) have attempted to address these assumptions but continue to reinforce what we term the SPECIALISATION FALLACY—the belief that specialised medical LLMs are inherently superior for CQA. To address this assumption, we introduce MEDASSESS-X, a deployment-industry-oriented CQA framework that applies alignment at inference time rather than through SFT. MEDASSESS-X uses lightweight steering vectors to guide model activations toward medically consistent reasoning without updating model weights or requiring domain-specific retraining. This inference-time alignment layer stabilises CQA performance across both general-purpose and specialised medical LLMs, thereby resolving the SPECIALISATION FALLACY. Empirically, MEDASSESS-X delivers consistent gains across all LLM families, improving Accuracy by up to +6%, Factual Consistency by +7%, and reducing Safety Error Rate by as much as 50%.

Anthology ID:: 2026.eacl-industry.64
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Yevgen Matusevych, Gülşen Eryiğit, Nikolaos Aletras
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 869–876
Language:
URL:: https://aclanthology.org/2026.eacl-industry.64/
DOI:
Bibkey:
Cite (ACL):: Sushant Kumar Ray, Gautam Siddharth Kashyap, Sahil Tripathi, Nipun Joshi, Vijay Govindarajan, Rafiq Ali, Jiechao Gao, and Usman Naseem. 2026. Do Clinical Question Answering Systems Really Need Specialised Medical Fine Tuning?. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track), pages 869–876, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Do Clinical Question Answering Systems Really Need Specialised Medical Fine Tuning? (Ray et al., EACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.eacl-industry.64.pdf

PDF Cite Search Fix data