Digging out Discrimination Information from Generated Samples for Robust Visual Question Answering

Zhiquan Wen, Yaowei Wang, Mingkui Tan, Qingyao Wu, Qi Wu


Abstract
Visual Question Answering (VQA) aims to answer a textual question based on a given image. Nevertheless, recent studies have shown that VQA models tend to capture the biases to answer the question, instead of using the reasoning ability, resulting in poor generalisation ability. To alleviate the issue, some existing methods consider the natural distribution of the data, and construct samples to balance the dataset, achieving remarkable performance. However, these methods may encounter some limitations: 1) rely on additional annotations, 2) the generated samples may be inaccurate, e.g., assigned wrong answers, and 3) ignore the power of positive samples. In this paper, we propose a method to Dig out Discrimination information from Generated samples (DDG) to address the above limitations. Specifically, we first construct positive and negative samples in vision and language modalities, without using additional annotations. Then, we introduce a knowledge distillation mechanism to promote the learning of the original samples by the positive samples. Moreover, we impel the VQA models to focus on vision and language modalities using the negative samples. Experimental results on the VQA-CP v2 and VQA v2 datasets show the effectiveness of our DDG.
Anthology ID:
2023.findings-acl.432
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6910–6928
Language:
URL:
https://aclanthology.org/2023.findings-acl.432
DOI:
10.18653/v1/2023.findings-acl.432
Bibkey:
Cite (ACL):
Zhiquan Wen, Yaowei Wang, Mingkui Tan, Qingyao Wu, and Qi Wu. 2023. Digging out Discrimination Information from Generated Samples for Robust Visual Question Answering. In Findings of the Association for Computational Linguistics: ACL 2023, pages 6910–6928, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Digging out Discrimination Information from Generated Samples for Robust Visual Question Answering (Wen et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.432.pdf