Dmitry Novokshanov

2026

Shughni Machine Translation Enhanced by Donor Languages
Dmitry Novokshanov | Innokentiy S. Humonen | Ilya Makarov
The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family

This paper presents the first machine translation system for Shughni, an extremely lowresource Eastern Iranian language spoken in Tajikistan and Afghanistan. We fine-tune NLLB-200 models and explore auxiliary language selection through typological similarity and "super-donor" experiments. Our final Shughni–Russian model achieves a chrF++ score of 36.3 (45.7 on BivalTyp data), establishing the first computational translation resource for this language. Beyond reporting system performance, this work demonstrates a practical path toward supporting languages with virtually no prior MT resources. Our demo system with Shughni-Russian- English translation (Russian serves as a pivot language for the Shughni- English pair) is available on Hugging- Face (https://huggingface.co/spaces/Novokshanov/Shughni-Translator).

pdf bib abs

Data-Centric Approach at the LoResMT 2026 Turkic Translation Challenge: Russian-Kyrgyz
Dmitry Novokshanov
Proceedings for the Ninth Workshop on Technologies for Machine Translation of Low Resource Languages (LoResMT 2026)

We describe our submission to the Turkic languages translation challenge at LoResMT 2026, which focuses on translation from Russian into Kyrgyz. Our approach leverages parallel data, synthetic translations, a comprehensive filtering pipeline and a four-stage curriculum learning strategy. We compare our system with contemporary baselines and present the model that achieves a chrF++ score of 49.1 and takes first place in the competition.

2022

pdf bib abs

Digital Resources for the Shughni Language
Yury Makarov | Maksim Melenchenko | Dmitry Novokshanov
Proceedings of the Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia within the 13th Language Resources and Evaluation Conference

This paper describes the Shughni Documentation Project consisting of the Online Shughni Dictionary, morphological analyzer, orthography converter, and Shughni corpus. The online dictionary has not only basic functions such as finding words but also facilitates more complex tasks. Representing a lexeme as a network of database sections makes it possible to search in particular domains (e.g., in meanings only), and the system of labels facilitates conditional search queries. Apart from this, users can make search queries and view entries in different orthographies of the Shughni language and send feedback in case they spot mistakes. Editors can add, modify, or delete entries without programming skills via an intuitive interface. In future, such website architecture can be applied to creating a lexical database of Iranian languages. The morphological analyzer performs automatic analysis of Shughni texts, which is useful for linguistic research and documentation. Once the analysis is complete, homonymy resolution must be conducted so that the annotated texts are ready to be uploaded to the corpus. The analyzer makes use of the orthographic converter, which helps to tackle the problem of spelling variability in Shughni, a language with no standard literary tradition.

Co-authors

Venues

Fix author