Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features

Ester Hlavnova, Sebastian Ruder


Abstract
A challenge towards developing NLP systems for the world’s languages is understanding how they generalize to typological differences relevant for real-world applications. To this end, we propose M2C, a morphologically-aware framework for behavioral testing of NLP models. We use M2C to generate tests that probe models’ behavior in light of specific linguistic features in 12 typologically diverse languages. We evaluate state-of-the-art language models on the generated tests. While models excel at most tests in English, we highlight generalization failures to specific typological characteristics such as temporal expressions in Swahili and compounding possessives in Finish. Our findings motivate the development of models that address these blind spots.
Anthology ID:
2023.acl-long.396
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7181–7198
Language:
URL:
https://aclanthology.org/2023.acl-long.396
DOI:
10.18653/v1/2023.acl-long.396
Bibkey:
Cite (ACL):
Ester Hlavnova and Sebastian Ruder. 2023. Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7181–7198, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features (Hlavnova & Ruder, ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-long.396.pdf
Video:
 https://aclanthology.org/2023.acl-long.396.mp4