Benchmarking Compositionality with Formal Languages

Josef Valvoda; Naomi Saphra; Jonathan Rawski; Adina Williams; Ryan Cotterell

Benchmarking Compositionality with Formal Languages

Josef Valvoda, Naomi Saphra, Jonathan Rawski, Adina Williams, Ryan Cotterell

Abstract

Recombining known primitive concepts into larger novel combinations is a quintessentially human cognitive capability. Whether large neural models in NLP acquire this ability while learning from data is an open question. In this paper, we look at this problem from the perspective of formal languages. We use deterministic finite-state transducers to make an unbounded number of datasets with controllable properties governing compositionality. By randomly sampling over many transducers, we explore which of their properties (number of states, alphabet size, number of transitions etc.) contribute to learnability of a compositional relation by a neural network. In general, we find that the models either learn the relations completely or not at all. The key is transition coverage, setting a soft learnability limit at 400 examples per transition.

Anthology ID:: 2022.coling-1.525
Original:: 2022.coling-1.525v1
Version 2:: 2022.coling-1.525v2
Volume:: Proceedings of the 29th International Conference on Computational Linguistics
Month:: October
Year:: 2022
Address:: Gyeongju, Republic of Korea
Editors:: Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:: COLING
SIG:
Publisher:: International Committee on Computational Linguistics
Note:
Pages:: 6007–6018
Language:
URL:: https://aclanthology.org/2022.coling-1.525
DOI:
Bibkey:
Cite (ACL):: Josef Valvoda, Naomi Saphra, Jonathan Rawski, Adina Williams, and Ryan Cotterell. 2022. Benchmarking Compositionality with Formal Languages. In Proceedings of the 29th International Conference on Computational Linguistics, pages 6007–6018, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):: Benchmarking Compositionality with Formal Languages (Valvoda et al., COLING 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.coling-1.525.pdf
Code: valvoda/neuraltransducer
Data: GSCAN, SCAN

PDF (v2) PDF (v1) Cite Search Code