Toward a Gold-Standard Benchmark for Evaluating Ukrainian Language Proficiency in LLMs

Svitlana Galeshchuk, Yuliia Maksymiuk, Yuliia Chernobrov, Nina Stankevych, Oleksandra Antoniv, Nataliia Faryna, Oksana Popkova


Abstract
The paper presents an expert-curated benchmark for assessing Ukrainian proficiency in LLMs, focusing on grammar and orthography as core components of language competence. Prepared by professional linguists, the proposed gold-standard dataset is designed to test normative Ukrainian usage.The benchmark is further used to evaluate a range of LLMs, including Ukrainian-focused, multilingual, and large-scale models, under zero-shot and few-shot prompting in Ukrainian and English. Across these settings, smaller models achieve no more than 42.1% accuracy, while large-scale LLMs reach up to 59.6%. These results show that standard Ukrainian remains challenging for current LLMs and highlight the need for stronger language-specific evaluation and adaptation.
Anthology ID:
2026.unlp-1.12
Volume:
Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026)
Month:
May
Year:
2026
Address:
Lviv, Ukraine
Editor:
Mariana Romanyshyn
Venue:
UNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
121–135
Language:
URL:
https://aclanthology.org/2026.unlp-1.12/
DOI:
Bibkey:
Cite (ACL):
Svitlana Galeshchuk, Yuliia Maksymiuk, Yuliia Chernobrov, Nina Stankevych, Oleksandra Antoniv, Nataliia Faryna, and Oksana Popkova. 2026. Toward a Gold-Standard Benchmark for Evaluating Ukrainian Language Proficiency in LLMs. In Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026), pages 121–135, Lviv, Ukraine. Association for Computational Linguistics.
Cite (Informal):
Toward a Gold-Standard Benchmark for Evaluating Ukrainian Language Proficiency in LLMs (Galeshchuk et al., UNLP 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.unlp-1.12.pdf