Yana Veitsman

2025

Recent Advancements and Challenges of Turkic Central Asian Language Processing
Yana Veitsman | Mareike Hartmann
Proceedings of the First Workshop on Language Models for Low-Resource Languages

Research in NLP for Central Asian Turkic languages - Kazakh, Uzbek, Kyrgyz, and Turkmen - faces typical low-resource language challenges like data scarcity, limited linguistic resources and technology development. However, recent advancements have included the collection of language-specific datasets and the development of models for downstream tasks. Thus, this paper aims to summarize recent progress and identify future research directions. It provides a high-level overview of each language’s linguistic features, the current technology landscape, the application of transfer learning from higher-resource languages, and the availability of labeled and unlabeled data. By outlining the current state, we hope to inspire and facilitate future research.

Co-authors

Mareike Hartmann 1

Venues

LoResLM1
WS1

Fix author