Adapting Multilingual NMT to Language Isolates: The Role of Proxy Language Selection and Dialect Handling for Nivkh

Eleonora Izmailova, Alexey Sorokin, Pavel Grashchenkov


Abstract
Neural machine translation has achieved remarkable results for high-resource languages, yet language isolates – those with no demonstrated genetic relatives – remain severely underserved, as they cannot benefit from cross-lingual transfer with related languages. We present the first NMT system for Nivkh, a critically endangered language isolate spoken by fewer than 100 fluent speakers in the Russian Far East. Working with approximately 9.5k parallel sentences – expanded through fine-tuned LaBSE sentence alignment – we adapt NLLB-200 to Nivkh-Russian translation. Since Nivkh is absent from NLLB’s language inventory, we investigate proxy language token selection, comparing six typologically diverse languages: Bashkir, Kazakh, Halh Mongolian, Turkish, Tajik, and French. We find that using any proxy substantially outperforms random token initialization (BLEU 18-19.02 vs. 15.44 for rus→niv), confirming the value of proxy-based transfer. However, the choice of which proxy has minimal impact, with all six achieving comparable results despite spanning four language families and two scripts. This suggests that for language isolates, practitioners can select any typologically reasonable proxy without significant performance penalty. We additionally present preliminary experiments on dialect-specific models for Amur and Sakhalin Nivkh. Our findings establish baseline results for future Nivkh NLP research and provide practical guidance for adapting multilingual models to other language isolates.
Anthology ID:
2026.loresmt-1.11
Volume:
Proceedings for the Ninth Workshop on Technologies for Machine Translation of Low Resource Languages (LoResMT 2026)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Atul Kr. Ojha, Chao-hong Liu, Ekaterina Vylomova, Flammie Pirinen, Jonathan Washington, Nathaniel Oco, Xiaobing Zhao
Venues:
LoResMT | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
127–137
Language:
URL:
https://aclanthology.org/2026.loresmt-1.11/
DOI:
Bibkey:
Cite (ACL):
Eleonora Izmailova, Alexey Sorokin, and Pavel Grashchenkov. 2026. Adapting Multilingual NMT to Language Isolates: The Role of Proxy Language Selection and Dialect Handling for Nivkh. In Proceedings for the Ninth Workshop on Technologies for Machine Translation of Low Resource Languages (LoResMT 2026), pages 127–137, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Adapting Multilingual NMT to Language Isolates: The Role of Proxy Language Selection and Dialect Handling for Nivkh (Izmailova et al., LoResMT 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.loresmt-1.11.pdf