A Linguistically-informed Comparison between Multilingual BERT and Language-specific BERT Models: The Case of Differential Object Marking in Romanian

Maria Tepei; Jelke Bloem

A Linguistically-informed Comparison between Multilingual BERT and Language-specific BERT Models: The Case of Differential Object Marking in Romanian

Abstract

Current linguistic challenge datasets for language models focus on phenomena that exist in English. This may lead to a lack of attention for typological features beyond English. This is particularly an issue for multilingual models, which may be biased towards English by their training data and this bias may be amplified if benchmarks are also English-centered. We present the syntactically and semantically complex language phenomenon of Differential Object Marking (DOM) in Romanian as a challenging Masked Language Modelling task and compare the performance of monolingual and multilingual models. Results indicate that Romanian-specific BERT models perform better than equivalent multilingual one in representing this phenomenon.

Anthology ID:: 2025.ranlp-1.147
Volume:: Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:: September
Year:: 2025
Address:: Varna, Bulgaria
Editors:: Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:: RANLP
SIG:
Publisher:: INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:: 1271–1281
Language:
URL:: https://aclanthology.org/2025.ranlp-1.147/
DOI:
Bibkey:
Cite (ACL):: Maria Tepei and Jelke Bloem. 2025. A Linguistically-informed Comparison between Multilingual BERT and Language-specific BERT Models: The Case of Differential Object Marking in Romanian. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 1271–1281, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):: A Linguistically-informed Comparison between Multilingual BERT and Language-specific BERT Models: The Case of Differential Object Marking in Romanian (Tepei & Bloem, RANLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.ranlp-1.147.pdf

PDF Cite Search Fix data