A multilabel approach to morphosyntactic probing

Naomi Shapiro; Amandalynne Paullada; Shane Steinert-Threlkeld

doi:10.18653/v1/2021.findings-emnlp.382

A multilabel approach to morphosyntactic probing

Naomi Shapiro, Amandalynne Paullada, Shane Steinert-Threlkeld

Abstract

We propose using a multilabel probing task to assess the morphosyntactic representations of multilingual word embeddings. This tweak on canonical probing makes it easy to explore morphosyntactic representations, both holistically and at the level of individual features (e.g., gender, number, case), and leads more naturally to the study of how language models handle co-occurring features (e.g., agreement phenomena). We demonstrate this task with multilingual BERT (Devlin et al., 2018), training probes for seven typologically diverse languages: Afrikaans, Croatian, Finnish, Hebrew, Korean, Spanish, and Turkish. Through this simple but robust paradigm, we verify that multilingual BERT renders many morphosyntactic features simultaneously extractable. We further evaluate the probes on six held-out languages: Arabic, Chinese, Marathi, Slovenian, Tagalog, and Yoruba. This zero-shot style of probing has the added benefit of revealing which cross-linguistic properties a language model recognizes as being shared by multiple languages.

Anthology ID:: 2021.findings-emnlp.382
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2021
Month:: November
Year:: 2021
Address:: Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: Findings
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4486–4524
Language:
URL:: https://aclanthology.org/2021.findings-emnlp.382/
DOI:: 10.18653/v1/2021.findings-emnlp.382
Bibkey:
Cite (ACL):: Naomi Shapiro, Amandalynne Paullada, and Shane Steinert-Threlkeld. 2021. A multilabel approach to morphosyntactic probing. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4486–4524, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: A multilabel approach to morphosyntactic probing (Shapiro et al., Findings 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.findings-emnlp.382.pdf
Video:: https://aclanthology.org/2021.findings-emnlp.382.mp4

PDF Cite Search Video Fix data