Shughni Machine Translation Enhanced by Donor Languages

Dmitry Novokshanov, Innokentiy S. Humonen, Ilya Makarov


Abstract
This paper presents the first machine translation system for Shughni, an extremely lowresource Eastern Iranian language spoken in Tajikistan and Afghanistan. We fine-tune NLLB-200 models and explore auxiliary language selection through typological similarity and "super-donor" experiments. Our final Shughni–Russian model achieves a chrF++ score of 36.3 (45.7 on BivalTyp data), establishing the first computational translation resource for this language. Beyond reporting system performance, this work demonstrates a practical path toward supporting languages with virtually no prior MT resources. Our demo system with Shughni-Russian- English translation (Russian serves as a pivot language for the Shughni- English pair) is available on Hugging- Face (https://huggingface.co/spaces/Novokshanov/Shughni-Translator).
Anthology ID:
2026.silkroadnlp-1.12
Volume:
The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Rayyan Merchant, Karine Megerdoomian
Venues:
SilkRoadNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
114–120
Language:
URL:
https://aclanthology.org/2026.silkroadnlp-1.12/
DOI:
Bibkey:
Cite (ACL):
Dmitry Novokshanov, Innokentiy S. Humonen, and Ilya Makarov. 2026. Shughni Machine Translation Enhanced by Donor Languages. In The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family, pages 114–120, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Shughni Machine Translation Enhanced by Donor Languages (Novokshanov et al., SilkRoadNLP 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.silkroadnlp-1.12.pdf