Multimodal Dual-Path Decoding for Medical Report Generation

Jinghan Sun; Dong Wei; Zhihong Zhu; Yuyang Xue; Steven McDonagh; Xian Wu

doi:10.18653/v1/2026.findings-acl.1997

Multimodal Dual-Path Decoding for Medical Report Generation

Jinghan Sun, Dong Wei, Zhihong Zhu, Yuyang Xue, Steven McDonagh, Xian Wu

Abstract

Radiology report generation requires precise alignment between medical imaging findings and clinically coherent textual descriptions. While current methods predominantly rely on either large vision-language models (LVLMs) for visual grounding or large language models (LLMs) for medical narrative generation, they often fail to effectively integrate multimodal clinical evidence with domain-specific knowledge. This paper proposes a novel multimodal dual-path framework that synergistically combines LVLMs and LLMs to address these limitations. Our approach establishes a dynamic fusion between LVLMs’ visual-semantic grounding capabilities and LLMs’ clinical knowledge reasoning. Specifically, we employ a structured prompting strategy that models the report generation task into three clinically meaningful sections and introduces fine-grained multi-label classification prompts to guide the models, enabling more accurate and comprehensive clinical report generation. Experiments on the public MIMIC-CXR benchmark demonstrate our framework’s superiority over state-of-the-art methods.

Anthology ID:: 2026.findings-acl.1997
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 40193–40204
Language:
URL:: https://aclanthology.org/2026.findings-acl.1997/
DOI:: 10.18653/v1/2026.findings-acl.1997
Bibkey:
Cite (ACL):: Jinghan Sun, Dong Wei, Zhihong Zhu, Yuyang Xue, Steven McDonagh, and Xian Wu. 2026. Multimodal Dual-Path Decoding for Medical Report Generation. In Findings of the Association for Computational Linguistics: ACL 2026, pages 40193–40204, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Multimodal Dual-Path Decoding for Medical Report Generation (Sun et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1997.pdf
Checklist:: 2026.findings-acl.1997.checklist.pdf

PDF Cite Search Checklist Fix data