Shayok Chakraborty

2025

Large language models (LLMs) hold promise for advancing patient–provider communication, yet a persistent gap remains between benchmark-driven model development and the realities of clinical practice. This work presents a systematic, clinically grounded review of text-based medical datasets for LLM training and evaluation. We propose a scenario-based taxonomy derived from established clinical frameworks to map major knowledge-based and conversation-based corpora against core communication scenarios. We further synthesize core communication skills from gold-standard clinical assessment instruments and meta-analyze state-of-the-art medical LLM performance, highlighting how dataset properties, fine-tuning strategies, and evaluation metrics shape both knowledge acquisition and communicative competence. To empirically validate these findings, we conducted controlled fine-tuning experiments across representative LLMs, demonstrating that data composition and scenario alignment critically affect model performance. Our findings highlight the urgent need for scenario-rich datasets and standardized, human-centered evaluation protocol to advance clinically relevant medical LLMs.

pdf bib abs

MediVLM: A Vision Language Model for Radiology Report Generation from Medical Images
Debanjan Goswami | Ronast Subedi | Shayok Chakraborty
Findings of the Association for Computational Linguistics: EMNLP 2025

Generating radiology reports from medical images has garnered sufficient attention in the research community. While existing methods have demonstrated promise, they often tend to generate reports that are factually incomplete and inconsistent, fail to focus on informative regions within an image, and impose strong annotation assumptions, such as bounding box annotations, image level annotations (which can be challenging to obtain) for model training. In this paper, we propose MediVLM, a vision language model (VLM) for radiology report generation from medical images. The proposed model consists of a pre-trained object detector to extract the salient anatomical regions from the images, an image encoder, a text encoder, a module to align the visual and text representations, a cross attention layer to fuse the two representations and finally, a transformer based decoder to generate the final report. MediVLM can generate radiology reports even when no reports are available for training; this is an extremely useful feature, as curating such reports is a labor-intensive task. Further, it computes a severity score (depicting the seriousness of a patient’s medical condition) from the generated radiology reports, which can be used to prioritize patients who need immediate medical attention. Our extensive empirical analyses on three benchmark datasets corroborate the promise and potential of our method against competing baselines. Our code is open-sourcedin our project webpage at: https://sites.google.com/view/medivlm/home

Co-authors

Venues

Findings2

Fix author