Attribute Diversity Determines the Systematicity Gap in VQA

Ian Berlot-Attwell, Kumar Agrawal, Annabelle Carrell, Yash Sharma, Naomi Saphra


Abstract
Although modern neural networks often generalize to new combinations of familiar concepts, the conditions that enable such compositionality have long been an open question. In this work, we study the systematicity gap in visual question answering: the performance difference between reasoning on previously seen and unseen combinations of object attributes. To test, we introduce a novel diagnostic dataset, CLEVR-HOPE. We find that the systematicity gap is not reduced by increasing the quantity of training data, but is reduced by increasing the diversity of training data. In particular, our experiments suggest that the more distinct attribute type combinations are seen during training, the more systematic we can expect the resulting model to be.
Anthology ID:
2024.emnlp-main.537
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9576–9611
Language:
URL:
https://aclanthology.org/2024.emnlp-main.537
DOI:
Bibkey:
Cite (ACL):
Ian Berlot-Attwell, Kumar Agrawal, Annabelle Carrell, Yash Sharma, and Naomi Saphra. 2024. Attribute Diversity Determines the Systematicity Gap in VQA. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 9576–9611, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Attribute Diversity Determines the Systematicity Gap in VQA (Berlot-Attwell et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.537.pdf