Mukhammadsaid Mamasaidov
2024
Open Language Data Initiative: Advancing Low-Resource Machine Translation for Karakalpak
Mukhammadsaid Mamasaidov
|
Abror Shopulatov
Proceedings of the Ninth Conference on Machine Translation
This study presents several contributions for the Karakalpak language: a FLORES+ devtest dataset translated to Karakalpak, parallel corpora for Uzbek-Karakalpak, Russian-Karakalpak and English-Karakalpak of 100,000 pairs each and open-sourced fine-tuned neural models for translation across these languages. Our experiments compare different model variants and training approaches, demonstrating improvements over existing baselines. This work, conducted as part of the Open Language Data Initiative (OLDI) shared task, aims to advance machine translation capabilities for Karakalpak and contribute to expanding linguistic diversity in NLP technologies.
2021
UZWORDNET: A Lexical-Semantic Database for the Uzbek Language
Alessandro Agostini
|
Timur Usmanov
|
Ulugbek Khamdamov
|
Nilufar Abdurakhmonova
|
Mukhammadsaid Mamasaidov
Proceedings of the 11th Global Wordnet Conference
The results reported in this paper aim to increase the presence of the Uzbek language in the Internet and its usability within IT applications. We describe the initial development of a “word-net” for the Uzbek language compatible to Princeton WordNet. We called it UZWORDNET. In the current version, UZWORDNET contains 28140 synsets, 64389 sense and 20683 words; its estimated accuracy is 75.98%. To the best of our knowledge, it is the largest wordnet for Uzbek existing to date, and the second wordnet developed overall.