MasonNLP at MEDIQA-WV 2025: Multimodal Retrieval-Augmented Generation with Large Language Models for Medical VQA

A H M Rezaul Karim; Ozlem Uzuner

MasonNLP at MEDIQA-WV 2025: Multimodal Retrieval-Augmented Generation with Large Language Models for Medical VQA

Abstract

Medical Visual Question Answering (MedVQA) enables natural language queries over medical images to support clinical decision-making and patient care. The MEDIQA-WV 2025 shared task addressed wound-care VQA, requiring systems to generate free-text responses and structured wound attributes from images and patient queries. We present the MasonNLP system, which employs a general-domain, instruction-tuned large language model with a retrieval-augmented generation (RAG) framework that incorporates textual and visual examples from in-domain data. This approach grounds outputs in clinically relevant exemplars, improving reasoning, schema adherence, and response quality across dBLEU, ROUGE, BERTScore, and LLM-based metrics. Our best-performing system ranked 3rd among 19 teams and 51 submissions with an average score of 41.37%, demonstrating that lightweight RAG with general-purpose LLMs—a minimal inference-time layer that adds a few relevant exemplars via simple indexing and fusion, with no extra training or complex re-ranking— provides a simple and effective baseline for multimodal clinical NLP tasks.

Anthology ID:: 2025.clinicalnlp-1.10
Volume:: Proceedings of the 7th Clinical Natural Language Processing Workshop
Month:: October
Year:: 2025
Address:: Virtual
Editors:: Asma Ben Abacha, Steven Bethard, Danielle Bitterman, Tristan Naumann, Kirk Roberts
Venues:: ClinicalNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 84–94
Language:
URL:: https://aclanthology.org/2025.clinicalnlp-1.10/
DOI:
Bibkey:
Cite (ACL):: A H M Rezaul Karim and Ozlem Uzuner. 2025. MasonNLP at MEDIQA-WV 2025: Multimodal Retrieval-Augmented Generation with Large Language Models for Medical VQA. In Proceedings of the 7th Clinical Natural Language Processing Workshop, pages 84–94, Virtual. Association for Computational Linguistics.
Cite (Informal):: MasonNLP at MEDIQA-WV 2025: Multimodal Retrieval-Augmented Generation with Large Language Models for Medical VQA (Karim & Uzuner, ClinicalNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.clinicalnlp-1.10.pdf

PDF Cite Search Fix data