Medical Visual Textual Entailment for Numerical Understanding of Vision-and-Language Models

Hitomi Yanaka, Yuta Nakamura, Yuki Chida, Tomoya Kurosawa


Abstract
Assessing the capacity of numerical understanding of vision-and-language models over images and texts is crucial for real vision-and-language applications, such as systems for automated medical image analysis. We provide a visual reasoning dataset focusing on numerical understanding in the medical domain. The experiments using our dataset show that current vision-and-language models fail to perform numerical inference in the medical domain. However, the data augmentation with only a small amount of our dataset improves the model performance, while maintaining the performance in the general domain.
Anthology ID:
2023.clinicalnlp-1.2
Volume:
Proceedings of the 5th Clinical Natural Language Processing Workshop
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Tristan Naumann, Asma Ben Abacha, Steven Bethard, Kirk Roberts, Anna Rumshisky
Venue:
ClinicalNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8–18
Language:
URL:
https://aclanthology.org/2023.clinicalnlp-1.2
DOI:
10.18653/v1/2023.clinicalnlp-1.2
Bibkey:
Cite (ACL):
Hitomi Yanaka, Yuta Nakamura, Yuki Chida, and Tomoya Kurosawa. 2023. Medical Visual Textual Entailment for Numerical Understanding of Vision-and-Language Models. In Proceedings of the 5th Clinical Natural Language Processing Workshop, pages 8–18, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Medical Visual Textual Entailment for Numerical Understanding of Vision-and-Language Models (Yanaka et al., ClinicalNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.clinicalnlp-1.2.pdf