Jonathan Rawski

2025

pdf bib abs
Transformers as Transducers
Lena Strobl | Dana Angluin | David Chiang | Jonathan Rawski | Ashish Sabharwal
Transactions of the Association for Computational Linguistics, Volume 13

We study the sequence-to-sequence mapping capacity of transformers by relating them to finite transducers, and find that they can express surprisingly large classes of (total functional) transductions. We do so using variants of RASP, a programming language designed to help people “think like transformers,” as an intermediate representation. We extend the existing Boolean variant B-RASP to sequence-to-sequence transductions and show that it computes exactly the first-order rational transductions (such as string rotation). Then, we introduce two new extensions. B-RASP[pos] enables calculations on positions (such as copying the first half of a string) and contains all first-order regular transductions. S-RASP adds prefix sum, which enables additional arithmetic operations (such as squaring a string) and contains all first-order polyregular transductions. Finally, we show that masked average-hard attention transformers can simulate S-RASP.

2024

pdf bib
Tensor Product Representations of Regular Transductions
Zhouyi Sun | Jonathan Rawski
Proceedings of the Society for Computation in Linguistics 2024

2022

pdf bib abs
Benchmarking Compositionality with Formal Languages
Josef Valvoda | Naomi Saphra | Jonathan Rawski | Adina Williams | Ryan Cotterell
Proceedings of the 29th International Conference on Computational Linguistics

Recombining known primitive concepts into larger novel combinations is a quintessentially human cognitive capability. Whether large neural models in NLP acquire this ability while learning from data is an open question. In this paper, we look at this problem from the perspective of formal languages. We use deterministic finite-state transducers to make an unbounded number of datasets with controllable properties governing compositionality. By randomly sampling over many transducers, we explore which of their properties (number of states, alphabet size, number of transitions etc.) contribute to learnability of a compositional relation by a neural network. In general, we find that the models either learn the relations completely or not at all. The key is transition coverage, setting a soft learnability limit at 400 examples per transition.

Jonathan Rawski

2025

2024

2022

2021

2020

2019

Co-authors

Venues