Innokentiy S. Humonen


2026

This paper presents the first machine translation system for Shughni, an extremely lowresource Eastern Iranian language spoken in Tajikistan and Afghanistan. We fine-tune NLLB-200 models and explore auxiliary language selection through typological similarity and "super-donor" experiments. Our final Shughni–Russian model achieves a chrF++ score of 36.3 (45.7 on BivalTyp data), establishing the first computational translation resource for this language. Beyond reporting system performance, this work demonstrates a practical path toward supporting languages with virtually no prior MT resources. Our demo system with Shughni-Russian- English translation (Russian serves as a pivot language for the Shughni- English pair) is available on Hugging- Face (https://huggingface.co/spaces/Novokshanov/Shughni-Translator).