Evaluating Transformer’s Ability to Learn Mildly Context-Sensitive Languages

Shunjie Wang; Shane Steinert-Threlkeld

doi:10.18653/v1/2023.blackboxnlp-1.21

Evaluating Transformer’s Ability to Learn Mildly Context-Sensitive Languages

Abstract

Despite the fact that Transformers perform well in NLP tasks, recent studies suggest that self-attention is theoretically limited in learning even some regular and context-free languages. These findings motivated us to think about their implications in modeling natural language, which is hypothesized to be mildly context-sensitive. We test the Transformer’s ability to learn mildly context-sensitive languages of varying complexities, and find that they generalize well to unseen in-distribution data, but their ability to extrapolate to longer strings is worse than that of LSTMs. Our analyses show that the learned self-attention patterns and representations modeled dependency relations and demonstrated counting behavior, which may have helped the models solve the languages.

Anthology ID:: 2023.blackboxnlp-1.21
Volume:: Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Yonatan Belinkov, Sophie Hao, Jaap Jumelet, Najoung Kim, Arya McCarthy, Hosein Mohebbi
Venues:: BlackboxNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 271–283
Language:
URL:: https://aclanthology.org/2023.blackboxnlp-1.21/
DOI:: 10.18653/v1/2023.blackboxnlp-1.21
Bibkey:
Cite (ACL):: Shunjie Wang and Shane Steinert-Threlkeld. 2023. Evaluating Transformer’s Ability to Learn Mildly Context-Sensitive Languages. In Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pages 271–283, Singapore. Association for Computational Linguistics.
Cite (Informal):: Evaluating Transformer’s Ability to Learn Mildly Context-Sensitive Languages (Wang & Steinert-Threlkeld, BlackboxNLP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.blackboxnlp-1.21.pdf

PDF Cite Search Fix data