Combine to Describe: Evaluating Compositional Generalization in Image Captioning

Georgios Pantazopoulos, Alessandro Suglia, Arash Eshghi


Abstract
Compositionality – the ability to combine simpler concepts to understand & generate arbitrarily more complex conceptual structures – has long been thought to be the cornerstone of human language capacity. With the recent, notable success of neural models in various NLP tasks, attention has now naturally turned to the compositional capacity of these models. In this paper, we study the compositional generalization properties of image captioning models. We perform a set experiments under controlled conditions using model and data ablations, each designed to benchmark a particular facet of compositional generalization: systematicity is the ability of a model to create novel combinations of concepts out of those observed during training, productivity is here operationalised as the capacity of a model to extend its predictions beyond the length distribution it has observed during training, and substitutivity is concerned with the robustness of the model against synonym substitutions. While previous work has focused primarily on systematicity, here we provide a more in-depth analysis of the strengths and weaknesses of state of the art captioning models. Our findings demonstrate that the models we study here do not compositionally generalize in terms of systematicity and productivity, however, they are robust to some degree to synonym substitutions
Anthology ID:
2022.acl-srw.11
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Samuel Louvan, Andrea Madotto, Brielen Madureira
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
115–131
Language:
URL:
https://aclanthology.org/2022.acl-srw.11
DOI:
10.18653/v1/2022.acl-srw.11
Bibkey:
Cite (ACL):
Georgios Pantazopoulos, Alessandro Suglia, and Arash Eshghi. 2022. Combine to Describe: Evaluating Compositional Generalization in Image Captioning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 115–131, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Combine to Describe: Evaluating Compositional Generalization in Image Captioning (Pantazopoulos et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-srw.11.pdf
Data
MS COCO