Question Modifiers in Visual Question Answering

William Britton, Somdeb Sarkhel, Deepak Venugopal


Abstract
Visual Question Answering (VQA) is a challenge problem that can advance AI by integrating several important sub-disciplines including natural language understanding and computer vision. Large VQA datasets that are publicly available for training and evaluation have driven the growth of VQA models that have obtained increasingly larger accuracy scores. However, it is also important to understand how much a model understands the details that are provided in a question. For example, studies in psychology have shown that syntactic complexity places a larger cognitive load on humans. Analogously, we want to understand if models have the perceptual capability to handle modifications to questions. Therefore, we develop a new dataset using Amazon Mechanical Turk where we asked workers to add modifiers to questions based on object properties and spatial relationships. We evaluate this data on LXMERT which is a state-of-the-art model in VQA that focuses more extensively on language processing. Our conclusions indicate that there is a significant negative impact on the performance of the model when the questions are modified to include more detailed information.
Anthology ID:
2022.lrec-1.158
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
1472–1479
Language:
URL:
https://aclanthology.org/2022.lrec-1.158
DOI:
Bibkey:
Cite (ACL):
William Britton, Somdeb Sarkhel, and Deepak Venugopal. 2022. Question Modifiers in Visual Question Answering. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1472–1479, Marseille, France. European Language Resources Association.
Cite (Informal):
Question Modifiers in Visual Question Answering (Britton et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.158.pdf
Data
Visual Question AnsweringVisual Question Answering v2.0