CmEAA: Cross-modal Enhancement and Alignment Adapter for Radiology Report Generation

Xiyang Huang, Yingjie Han, Yx L, Runzhi Li, Pengcheng Wu, Kunli Zhang


Abstract
Automatic radiology report generation is pivotal in reducing the workload of radiologists, while simultaneously improving diagnostic accuracy and operational efficiency. Current methods face significant challenges, including the effective alignment of medical visual features with textual features and the mitigation of data bias. In this paper, we propose a method for radiology report generation that utilizes a Cross-modal Enhancement and Alignment Adapter (CmEAA) to connect a vision encoder with a frozen large language model. Specifically, we introduce two novel modules within CmEAA: Cross-modal Feature Enhancement (CFE) and Neural Mutual Information Aligner (NMIA). CFE extracts observation-related contextual features to enhance the visual features of lesions and abnormal regions in radiology images through a cross-modal enhancement transformer. NMIA maximizes neural mutual information between visual and textual representations within a low-dimensional alignment embedding space during training and provides potential global alignment visual representations during inference. Additionally, a weights generator is designed to enable the dynamic adaptation of cross-modal enhanced features and vanilla visual features. Experimental results on two prevailing datasets, namely, IU X-Ray and MIMIC-CXR, demonstrate that the proposed model outperforms previous state-of-the-art methods.
Anthology ID:
2025.coling-main.571
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8546–8556
Language:
URL:
https://aclanthology.org/2025.coling-main.571/
DOI:
Bibkey:
Cite (ACL):
Xiyang Huang, Yingjie Han, Yx L, Runzhi Li, Pengcheng Wu, and Kunli Zhang. 2025. CmEAA: Cross-modal Enhancement and Alignment Adapter for Radiology Report Generation. In Proceedings of the 31st International Conference on Computational Linguistics, pages 8546–8556, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
CmEAA: Cross-modal Enhancement and Alignment Adapter for Radiology Report Generation (Huang et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.571.pdf