Yongtao Hao
2025
KIA: Knowledge-Guided Implicit Vision-Language Alignment for Chest X-Ray Report Generation
Heng Yin
|
Shanlin Zhou
|
Pandong Wang
|
Zirui Wu
|
Yongtao Hao
Proceedings of the 31st International Conference on Computational Linguistics
Report generation (RG) faces challenges in understanding complex medical images and establishing cross-modal semantic alignment in radiology image-report pairs. Previous methods often overlook fine-grained cross-modal interaction, leading to insufficient understanding of detailed information. Recently, various large multimodal models have been proposed for image-text tasks. However, such models still underperform on rare domain tasks like understanding complex medical images. To address these limitations, we develop a new framework of Knowledge-guided Implicit vision-language Alignment for radiology report generation, named KIA. To better understand medical reports and images and build alignment between them, multi-task implicit alignment is creatively introduced, forming comprehensive understanding of medical images and reports. Additionally, to further meet medical refinement requirements, we design novel masking strategies guided by medical knowledge to enhance pathological observation and anatomical landm