Entropy of Ukrainian

Anton Lavreniuk, Mykyta Mudryi, Markiian Chaklosh


Abstract
In natural language processing, the entropy of a language is a measure of its unpredictability and complexity. The first study on this subject was conducted by Claude Shannon in 1951. By having participants predict the next character in a sentence, he was able to approximate the entropy of the English language. Several follow-up studies by other authors have since been conducted for English, and one for Hebrew. However, to date, Shannon’s experiment has never been conducted for Ukrainian. In this paper, we perform this experiment for Ukrainian by recruiting 184 volunteers using social media channels. We rely on techniques used for English to approximate the entropy value of Ukrainian. The final result is an upper bound of H_upper ≈ 1.201 bits per character. We compare this to the performance of current Large Language Models. The methods and code used are also documented and published, along with a discussion of the main challenges encountered.
Anthology ID:
2026.unlp-1.4
Volume:
Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026)
Month:
May
Year:
2026
Address:
Lviv, Ukraine
Editor:
Mariana Romanyshyn
Venue:
UNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
33–40
Language:
URL:
https://aclanthology.org/2026.unlp-1.4/
DOI:
Bibkey:
Cite (ACL):
Anton Lavreniuk, Mykyta Mudryi, and Markiian Chaklosh. 2026. Entropy of Ukrainian. In Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026), pages 33–40, Lviv, Ukraine. Association for Computational Linguistics.
Cite (Informal):
Entropy of Ukrainian (Lavreniuk et al., UNLP 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.unlp-1.4.pdf