When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks

Ankur Sikarwar, Arkil Patel, Navin Goyal


Abstract
Humans can reason compositionally whilst grounding language utterances to the real world. Recent benchmarks like ReaSCAN (Wu et al., 2021) use navigation tasks grounded in a grid world to assess whether neural models exhibit similar capabilities. In this work, we present a simple transformer-based model that outperforms specialized architectures on ReaSCAN and a modified version (Qiu et al., 2021) of gSCAN (Ruis et al., 2020). On analyzing the task, we find that identifying the target location in the grid world is the main challenge for the models. Furthermore, we show that a particular split in ReaSCAN, which tests depth generalization, is unfair. On an amended version of this split, we show that transformers can generalize to deeper input structures. Finally, we design a simpler grounded compositional generalization task, RefEx, to investigate how transformers reason compositionally. We show that a single self-attention layer with a single head generalizes to novel combinations of object attributes. Moreover, we derive a precise mathematical construction of the transformer’s computations from the learned network. Overall, we provide valuable insights about the grounded compositional generalization task and the behaviour of transformers on it, which would be useful for researchers working in this area.
Anthology ID:
2022.emnlp-main.41
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
648–669
Language:
URL:
https://aclanthology.org/2022.emnlp-main.41
DOI:
Bibkey:
Cite (ACL):
Ankur Sikarwar, Arkil Patel, and Navin Goyal. 2022. When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 648–669, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks (Sikarwar et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-main.41.pdf