Fine-grained Medical Vision-Language Representation Learning for Radiology Report Generation

Siyuan Wang, Bo Peng, Yichao Liu, Qi Peng


Abstract
Given the input radiology images, the objective of radiology report generation is to produce accurate and comprehensive medical reports, which typically include multiple descriptive clinical sentences associated with different phenotypes. Most existing works have relied on a pre-trained vision encoder to extract the visual representations of the images. In this study, we propose a phenotype-driven medical vision-language representation learning framework to efficiently bridge the gap between visual and textual modalities for improved text-oriented generation. In contrast to conventional methods which learn medical vision-language representations by contrasting images with entire reports, our approach learns more fine-grained representations by contrasting images with each sentence within the reports. The learned fine-grained representations can be used to improve radiology report generation. The experiments on two widely-used datasets MIMIC-CXR and IU X-ray demonstrate that our method can achieve promising performances and substantially outperform the conventional vision-language representation learning methods.
Anthology ID:
2023.emnlp-main.989
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15949–15956
Language:
URL:
https://aclanthology.org/2023.emnlp-main.989
DOI:
10.18653/v1/2023.emnlp-main.989
Bibkey:
Cite (ACL):
Siyuan Wang, Bo Peng, Yichao Liu, and Qi Peng. 2023. Fine-grained Medical Vision-Language Representation Learning for Radiology Report Generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15949–15956, Singapore. Association for Computational Linguistics.
Cite (Informal):
Fine-grained Medical Vision-Language Representation Learning for Radiology Report Generation (Wang et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.989.pdf