Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation

Xi Zhang; Zaiqiao Meng; Jake Lever; Edmond S. L. Ho

doi:10.18653/v1/2024.bionlp-1.54

Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation

Xi Zhang, Zaiqiao Meng, Jake Lever, Edmond S.L. Ho

Abstract

This paper introduces a radiology-focused visual language model designed to generate radiology reports from chest X-rays. Building on previous findings that large language models can acquire multimodal capabilities when aligned with pretrained vision encoders, we demonstrate similar potential with chest X-ray images. The model combines an image encoder (CLIP) with a fine-tuned large language model (LLM) based on the Vicuna-7B architecture. The training process involves a two-stage approach: initial alignment of chest X-ray features with the LLM, followed by fine-tuning for radiology report generation. The study highlights the importance of generating both FINDINGS and IMPRESSIONS sections in radiology reports and evaluates the model’s performance using various metrics, achieving notable accuracy in generating high-quality medical reports. The research also addresses the need for domain-specific fine-tuning to capture the intricate details necessary for accurate medical interpretations and reports.

Anthology ID:: 2024.bionlp-1.54
Volume:: Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Dina Demner-Fushman, Sophia Ananiadou, Makoto Miwa, Kirk Roberts, Junichi Tsujii
Venues:: BioNLP | WS
SIG:: SIGBIOMED
Publisher:: Association for Computational Linguistics
Note:
Pages:: 624–634
Language:
URL:: https://aclanthology.org/2024.bionlp-1.54/
DOI:: 10.18653/v1/2024.bionlp-1.54
Bibkey:
Cite (ACL):: Xi Zhang, Zaiqiao Meng, Jake Lever, and Edmond S.L. Ho. 2024. Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation. In Proceedings of the 23rd Workshop on Biomedical Natural Language Processing, pages 624–634, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation (Zhang et al., BioNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.bionlp-1.54.pdf

PDF Cite Search Fix data