Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with Controllable Perturbations

Ekaterina Taktasheva, Vladislav Mikhailov, Ekaterina Artemova


Abstract
Recent research has adopted a new experimental field centered around the concept of text perturbations which has revealed that shuffled word order has little to no impact on the downstream performance of Transformer-based language models across many NLP tasks. These findings contradict the common understanding of how the models encode hierarchical and structural information and even question if the word order is modeled with position embeddings. To this end, this paper proposes nine probing datasets organized by the type of controllable text perturbation for three Indo-European languages with a varying degree of word order flexibility: English, Swedish and Russian. Based on the probing analysis of the M-BERT and M-BART models, we report that the syntactic sensitivity depends on the language and model pre-training objectives. We also find that the sensitivity grows across layers together with the increase of the perturbation granularity. Last but not least, we show that the models barely use the positional information to induce syntactic trees from their intermediate self-attention and contextualized representations.
Anthology ID:
2021.mrl-1.17
Volume:
Proceedings of the 1st Workshop on Multilingual Representation Learning
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Duygu Ataman, Alexandra Birch, Alexis Conneau, Orhan Firat, Sebastian Ruder, Gozde Gul Sahin
Venue:
MRL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
191–210
Language:
URL:
https://aclanthology.org/2021.mrl-1.17
DOI:
10.18653/v1/2021.mrl-1.17
Bibkey:
Cite (ACL):
Ekaterina Taktasheva, Vladislav Mikhailov, and Ekaterina Artemova. 2021. Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with Controllable Perturbations. In Proceedings of the 1st Workshop on Multilingual Representation Learning, pages 191–210, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with Controllable Perturbations (Taktasheva et al., MRL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.mrl-1.17.pdf
Video:
 https://aclanthology.org/2021.mrl-1.17.mp4
Code
 evtaktasheva/dependency_extraction
Data
BLiMPCoLAGLUE