Serbian SuperGLUE: Towards an Evaluation Benchmark for South Slavic Language Models

Mitar Perovic; Teodora Mihajlov

Serbian SuperGLUE: Towards an Evaluation Benchmark for South Slavic Language Models

Abstract

We introduce Serbian SuperGLUE, a comprehensive benchmark for evaluating natural language understanding in Serbian, adapted from the English SuperGLUE benchmark. The benchmark comprises seven tasks spanning question answering, natural language inference, and coreference resolution, created through a combination of LLM-based translation with automatic post-editing and native data generation. We evaluate seven encoder-based language models, including both Serbian-specific (BERTić, Jerteh) and multilingual models (mmBERT, XLM-RoBERTa variants). Our results reveal that multilingual models remain competitive with language-specific alternatives, with mmBERT achieving the best performance on RTE (75.7%) and XLM-R-BERTić leading on BoolQ (82.0%). We observe significant training variance on smaller datasets, with standard deviations exceeding 10% in some configurations, highlighting the importance of multi-seed evaluation for low-resource benchmarking. We release the benchmark, evaluation code, and model checkpoints to facilitate reproducible research on South Slavic language understanding.

Anthology ID:: 2026.loreslm-1.30
Volume:: Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Hansi Hettiarachchi, Tharindu Ranasinghe, Alistair Plum, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
Venue:: LoResLM
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 347–361
Language:
URL:: https://aclanthology.org/2026.loreslm-1.30/
DOI:
Bibkey:
Cite (ACL):: Mitar Perovic and Teodora Mihajlov. 2026. Serbian SuperGLUE: Towards an Evaluation Benchmark for South Slavic Language Models. In Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026), pages 347–361, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Serbian SuperGLUE: Towards an Evaluation Benchmark for South Slavic Language Models (Perovic & Mihajlov, LoResLM 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.loreslm-1.30.pdf

PDF Cite Search Fix data