Word-order Typology in Multilingual BERT: A Case Study in Subordinate-Clause Detection

Dmitry Nikolaev, Sebastian Pado


Abstract
The capabilities and limitations of BERT and similar models are still unclear when it comes to learning syntactic abstractions, in particular across languages. In this paper, we use the task of subordinate-clause detection within and across languages to probe these properties. We show that this task is deceptively simple, with easy gains offset by a long tail of harder cases, and that BERT’s zero-shot performance is dominated by word-order effects, mirroring the SVO/VSO/SOV typology.
Anthology ID:
2022.sigtyp-1.2
Volume:
Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Month:
July
Year:
2022
Address:
Seattle, Washington
Venues:
NAACL | SIGTYP
SIG:
SIGTYP
Publisher:
Association for Computational Linguistics
Note:
Pages:
11–21
Language:
URL:
https://aclanthology.org/2022.sigtyp-1.2
DOI:
10.18653/v1/2022.sigtyp-1.2
Bibkey:
Cite (ACL):
Dmitry Nikolaev and Sebastian Pado. 2022. Word-order Typology in Multilingual BERT: A Case Study in Subordinate-Clause Detection. In Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 11–21, Seattle, Washington. Association for Computational Linguistics.
Cite (Informal):
Word-order Typology in Multilingual BERT: A Case Study in Subordinate-Clause Detection (Nikolaev & Pado, SIGTYP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.sigtyp-1.2.pdf