Eval-UA-tion 1.0: Benchmark for Evaluating Ukrainian (Large) Language Models

Serhii Hamotskyi, Anna-Izabella Levbarg, Christian Hänig


Abstract
In this paper, we introduce Eval-UA-tion, a set of novel Ukrainian-language datasets aimed at evaluating the performance of language models on the Ukrainian language. The tasks include UA-CBT (inspired by the Children’s Book Test, a fill-in-the-gaps type task aimed at gauging the extent to which a story narrative is understood), UP-Titles (where the online newspaper Ukrainska Pravda‘s articles have to be matched to the correct title among 10 similar ones), and LMentry-static-UA/LMES (inspired by the LMentry benchmark, a set of tasks simple to solve for humans but hard for LMs, such as ‘which of these words is longer’ and ‘what is the fifth word of this sentence’). With the exception of UP-Titles, the tasks are built in a way to minimize contamination and use material unlikely to be present in the training sets of language models, and include a split for few-shot model prompting use that minimizes contamination. For each task human and random baselines are provided.
Anthology ID:
2024.unlp-1.13
Volume:
Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Mariana Romanyshyn, Nataliia Romanyshyn, Andrii Hlybovets, Oleksii Ignatenko
Venue:
UNLP
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
109–119
Language:
URL:
https://aclanthology.org/2024.unlp-1.13
DOI:
Bibkey:
Cite (ACL):
Serhii Hamotskyi, Anna-Izabella Levbarg, and Christian Hänig. 2024. Eval-UA-tion 1.0: Benchmark for Evaluating Ukrainian (Large) Language Models. In Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024, pages 109–119, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Eval-UA-tion 1.0: Benchmark for Evaluating Ukrainian (Large) Language Models (Hamotskyi et al., UNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.unlp-1.13.pdf