Compositional Networks Enable Systematic Generalization for Grounded Language Understanding

Yen-Ling Kuo, Boris Katz, Andrei Barbu


Abstract
Humans are remarkably flexible when understanding new sentences that include combinations of concepts they have never encountered before. Recent work has shown that while deep networks can mimic some human language abilities when presented with novel sentences, systematic variation uncovers the limitations in the language-understanding abilities of networks. We demonstrate that these limitations can be overcome by addressing the generalization challenges in the gSCAN dataset, which explicitly measures how well an agent is able to interpret novel linguistic commands grounded in vision, e.g., novel pairings of adjectives and nouns. The key principle we employ is compositionality: that the compositional structure of networks should reflect the compositional structure of the problem domain they address, while allowing other parameters to be learned end-to-end. We build a general-purpose mechanism that enables agents to generalize their language understanding to compositional domains. Crucially, our network has the same state-of-the-art performance as prior work while generalizing its knowledge when prior work does not. Our network also provides a level of interpretability that enables users to inspect what each part of networks learns. Robust grounded language understanding without dramatic failures and without corner cases is critical to building safe and fair robots; we demonstrate the significant role that compositionality can play in achieving that goal.
Anthology ID:
2021.findings-emnlp.21
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
216–226
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.21
DOI:
10.18653/v1/2021.findings-emnlp.21
Bibkey:
Cite (ACL):
Yen-Ling Kuo, Boris Katz, and Andrei Barbu. 2021. Compositional Networks Enable Systematic Generalization for Grounded Language Understanding. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 216–226, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Compositional Networks Enable Systematic Generalization for Grounded Language Understanding (Kuo et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.21.pdf
Video:
 https://aclanthology.org/2021.findings-emnlp.21.mp4
Code
 ylkuo/compositional-gscan
Data
GSCAN