A Multi-modal Debiasing Model with Dynamical Constraint for Robust Visual Question Answering

Yu Li (李豫); Bojie Hu; Fengshuo Zhang; Yahan Yu; Jian Liu; Yufeng Chen; Jinan Xu

doi:10.18653/v1/2023.findings-acl.311

A Multi-modal Debiasing Model with Dynamical Constraint for Robust Visual Question Answering

Yu Li, Bojie Hu, Fengshuo Zhang, Yahan Yu, Jian Liu, Yufeng Chen, Jinan Xu

Abstract

Recent studies have pointed out that many well-developed Visual Question Answering (VQA) systems suffer from bias problem. Despite the remarkable performance gained on In-Distribution (ID) datasets, the VQA model might merely capture the superficial correlation from question to answer rather than showing real reasoning abilities. Therefore, when switching to Out-of-Distribution (OOD) dataset, whose test distribution is unknown or even reversed with the training set, significant drops might be demonstrated. Although efforts have been devoted to easing the negative bias effect brought by language prior and analysing its inherent cause, they are still limited by the following two aspects. First, most current debiasing methods achieve promising OOD generalization ability with a major sacrifice of the ID performance. Second, existing researches are restricted by exploiting comprehensive biases, since weakening the language bias is mainly focused, while only a few works consider vision bias. In this paper, we investigate a straightforward way to mitigate bias problem for VQA task. Specifically, we reduce bias effect by subtracting bias score from standard VQA base score. Based on such a direct strategy, we design two bias learning branches to detect more bias information, which are combined with a dynamical constraint loss to alleviate the problem of over-correction and insufficient debiasing effect. We evaluate our method on the challenging VQA v2.0 and VQA-CP V2,0 datasets and the proposed method achievessignificant improvement.

Anthology ID:: 2023.findings-acl.311
Volume:: Findings of the Association for Computational Linguistics: ACL 2023
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5032–5045
Language:
URL:: https://aclanthology.org/2023.findings-acl.311
DOI:: 10.18653/v1/2023.findings-acl.311
Bibkey:
Cite (ACL):: Yu Li, Bojie Hu, Fengshuo Zhang, Yahan Yu, Jian Liu, Yufeng Chen, and Jinan Xu. 2023. A Multi-modal Debiasing Model with Dynamical Constraint for Robust Visual Question Answering. In Findings of the Association for Computational Linguistics: ACL 2023, pages 5032–5045, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: A Multi-modal Debiasing Model with Dynamical Constraint for Robust Visual Question Answering (Li et al., Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-acl.311.pdf
Video:: https://aclanthology.org/2023.findings-acl.311.mp4

PDF Cite Search Video