Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions

Daniel Rosenberg, Itai Gat, Amir Feder, Roi Reichart


Abstract
Deep learning algorithms have shown promising results in visual question answering (VQA) tasks, but a more careful look reveals that they often do not understand the rich signal they are being fed with. To understand and better measure the generalization capabilities of VQA systems, we look at their robustness to counterfactually augmented data. Our proposed augmentations are designed to make a focused intervention on a specific property of the question such that the answer changes. Using these augmentations, we propose a new robustness measure, Robustness to Augmented Data (RAD), which measures the consistency of model predictions between original and augmented examples. Through extensive experimentation, we show that RAD, unlike classical accuracy measures, can quantify when state-of-the-art systems are not robust to counterfactuals. We find substantial failure cases which reveal that current VQA systems are still brittle. Finally, we connect between robustness and generalization, demonstrating the predictive power of RAD for performance on unseen augmentations.
Anthology ID:
2021.acl-short.10
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Month:
August
Year:
2021
Address:
Online
Editors:
Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
61–70
Language:
URL:
https://aclanthology.org/2021.acl-short.10
DOI:
10.18653/v1/2021.acl-short.10
Bibkey:
Cite (ACL):
Daniel Rosenberg, Itai Gat, Amir Feder, and Roi Reichart. 2021. Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 61–70, Online. Association for Computational Linguistics.
Cite (Informal):
Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions (Rosenberg et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.acl-short.10.pdf
Video:
 https://aclanthology.org/2021.acl-short.10.mp4
Data
VisDialVisual Question Answering