EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian

Daryna Dementieva, Nikolay Babakov, Alexander Fraser


Abstract
While Ukrainian NLP has seen progress in many texts processing tasks, emotion classification remains an underexplored area with no publicly available benchmark to date. In this work, we introduce **EmoBench-UA**, the first annotated dataset for emotion detection in Ukrainian texts. Our annotation schema is adapted from the previous English-centric works on emotion detection (Mohammad et al., 2018; Mohammad, 2022) guidelines. The dataset was created through crowdsourcing using the Toloka.ai platform ensuring high-quality of the annotation process. Then, we evaluate a range of approaches on the collected dataset, starting from linguistic-based baselines, synthetic data translated from English, to large language models (LLMs). Our findings highlight the challenges of emotion classification in non-mainstream languages like Ukrainian and emphasize the need for further development of Ukrainian-specific models and training resources.
Anthology ID:
2025.findings-emnlp.107
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2025–2048
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.107/
DOI:
Bibkey:
Cite (ACL):
Daryna Dementieva, Nikolay Babakov, and Alexander Fraser. 2025. EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 2025–2048, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian (Dementieva et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.107.pdf
Checklist:
 2025.findings-emnlp.107.checklist.pdf