Arofat Akhundjanova
2025
Harmonizing Annotation of Turkic Postverbial Constructions: A Comparative Study of UD Treebanks
Arofat Akhundjanova
Proceedings of the 18th Workshop on Building and Using Comparable Corpora (BUCC)
As the number of treebanks within the same language family continues to grow, the importance of establishing consistent annotation practices has become increasingly evident. In this paper, we evaluate various approaches to annotating Turkic postverbial constructions across UD treebanks. Our comparative analysis reveals that none of the existing methods fully capture the unique semantic and syntactic characteristics of these complex constructions. This underscores the need to adopt a balanced approach that can achieve broad consensus and be implemented consistently across Turkic treebanks. By examining the phenomenon and the available annotation strategies, our study aims to improve the consistency of Turkic UD treebanks and enhance their utility for cross-linguistic research.
BBPOS: BERT-based Part-of-Speech Tagging for Uzbek
Latofat Bobojonova
|
Arofat Akhundjanova
|
Phil Sidney Ostheimer
|
Sophie Fellenz
Proceedings of the First Workshop on Language Models for Low-Resource Languages
This paper advances NLP research for the low-resource Uzbek language by evaluating two previously untested monolingual Uzbek BERT models on the part-of-speech (POS) tagging task and introducing the first publicly available UPOS-tagged benchmark dataset for Uzbek. Our fine-tuned models achieve 91% average accuracy, outperforming the baseline multi-lingual BERT as well as the rule-based tagger. Notably, these models capture intermediate POS changes through affixes and demonstrate context sensitivity, unlike existing rule-based taggers.