Mitar Perovic
2026
Serbian SuperGLUE: Towards an Evaluation Benchmark for South Slavic Language Models
Mitar Perovic | Teodora Mihajlov
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Mitar Perovic | Teodora Mihajlov
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
We introduce Serbian SuperGLUE, a comprehensive benchmark for evaluating natural language understanding in Serbian, adapted from the English SuperGLUE benchmark. The benchmark comprises seven tasks spanning question answering, natural language inference, and coreference resolution, created through a combination of LLM-based translation with automatic post-editing and native data generation. We evaluate seven encoder-based language models, including both Serbian-specific (BERTić, Jerteh) and multilingual models (mmBERT, XLM-RoBERTa variants). Our results reveal that multilingual models remain competitive with language-specific alternatives, with mmBERT achieving the best performance on RTE (75.7%) and XLM-R-BERTić leading on BoolQ (82.0%). We observe significant training variance on smaller datasets, with standard deviations exceeding 10% in some configurations, highlighting the importance of multi-seed evaluation for low-resource benchmarking. We release the benchmark, evaluation code, and model checkpoints to facilitate reproducible research on South Slavic language understanding.