PubMedCLIP: How Much Does CLIP Benefit Visual Question Answering in the Medical Domain?

Sedigheh Eslami; Christoph Meinel; Gerard De Melo

doi:10.18653/v1/2023.findings-eacl.88

PubMedCLIP: How Much Does CLIP Benefit Visual Question Answering in the Medical Domain?

Sedigheh Eslami, Christoph Meinel, Gerard de Melo

Abstract

Contrastive Language–Image Pre-training (CLIP) has shown remarkable success in learning with cross-modal supervision from extensive amounts of image–text pairs collected online. Thus far, the effectiveness of CLIP has been investigated primarily in general-domain multimodal problems. In this work, we evaluate the effectiveness of CLIP for the task of Medical Visual Question Answering (MedVQA). We present PubMedCLIP, a fine-tuned version of CLIP for the medical domain based on PubMed articles. Our experiments conducted on two MedVQA benchmark datasets illustrate that PubMedCLIP achieves superior results improving the overall accuracy up to 3% in comparison to the state-of-the-art Model-Agnostic Meta-Learning (MAML) networks pre-trained only on visual data. The PubMedCLIP model with different back-ends, the source code for pre-training them and reproducing our MedVQA pipeline is publicly available at https://github.com/sarahESL/PubMedCLIP.

Anthology ID:: 2023.findings-eacl.88
Volume:: Findings of the Association for Computational Linguistics: EACL 2023
Month:: May
Year:: 2023
Address:: Dubrovnik, Croatia
Editors:: Andreas Vlachos, Isabelle Augenstein
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1181–1193
Language:
URL:: https://aclanthology.org/2023.findings-eacl.88/
DOI:: 10.18653/v1/2023.findings-eacl.88
Bibkey:
Cite (ACL):: Sedigheh Eslami, Christoph Meinel, and Gerard de Melo. 2023. PubMedCLIP: How Much Does CLIP Benefit Visual Question Answering in the Medical Domain?. In Findings of the Association for Computational Linguistics: EACL 2023, pages 1181–1193, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):: PubMedCLIP: How Much Does CLIP Benefit Visual Question Answering in the Medical Domain? (Eslami et al., Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-eacl.88.pdf
Video:: https://aclanthology.org/2023.findings-eacl.88.mp4

PDF Cite Search Video Fix data