Making Transformers Solve Compositional Tasks

Santiago Ontanon; Joshua Ainslie; Zachary Fisher; Vaclav Cvicek

doi:10.18653/v1/2022.acl-long.251

Making Transformers Solve Compositional Tasks

Santiago Ontanon, Joshua Ainslie, Zachary Fisher, Vaclav Cvicek

Abstract

Several studies have reported the inability of Transformer models to generalize compositionally, a key type of generalization in many NLP tasks such as semantic parsing. In this paper we explore the design space of Transformer models showing that the inductive biases given to the model by several design decisions significantly impact compositional generalization. We identified Transformer configurations that generalize compositionally significantly better than previously reported in the literature in many compositional tasks. We achieve state-of-the-art results in a semantic parsing compositional generalization benchmark (COGS), and a string edit operation composition benchmark (PCFG).

Anthology ID:: 2022.acl-long.251
Volume:: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3591–3607
Language:
URL:: https://aclanthology.org/2022.acl-long.251/
DOI:: 10.18653/v1/2022.acl-long.251
Bibkey:
Cite (ACL):: Santiago Ontanon, Joshua Ainslie, Zachary Fisher, and Vaclav Cvicek. 2022. Making Transformers Solve Compositional Tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3591–3607, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Making Transformers Solve Compositional Tasks (Ontanon et al., ACL 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.acl-long.251.pdf
Software:: 2022.acl-long.251.software.zip
Code: google-research/google-research
Data: CFQ, SCAN

PDF Cite Search Code Software Fix data