Vuyko Mistral: Adapting LLMs for Low-Resource Dialectal Translation

Roman Kyslyi; Yuliia Maksymiuk; Ihor Pysmennyi

doi:10.18653/v1/2025.unlp-1.10

Vuyko Mistral: Adapting LLMs for Low-Resource Dialectal Translation

Roman Kyslyi, Yuliia Maksymiuk, Ihor Pysmennyi

Abstract

In this paper we introduce the first effort to adapt large language models (LLMs) to the Ukrainian dialect (in our case Hutsul), a low-resource and morphologically complex dialect spoken in the Carpathian Highlands. We created a parallel corpus of 9852 dialect-to-standard Ukrainian sentence pairs and a dictionary of 7320 dialectal word mappings. We also addressed data shortage by proposing an advanced Retrieval-Augmented Generation (RAG) pipeline to generate synthetic parallel translation pairs, expanding the corpus with 52142 examples. We have fine-tuned multiple open-source LLMs using LoRA and evaluated them on a standard-to-dialect translation task, also comparing with few-shot GPT-4o translation. In the absence of human annotators, we adopt a multi-metric evaluation strategy combining BLEU, chrF++, TER, and LLM-based judgment (GPT-4o). The results show that even small(7B) finetuned models outperform zero-shot baselines such as GPT-4o across both automatic and LLM-evaluated metrics. All data, models, and code are publicly released at: https://github.com/woters/vuyko-hutsul.

Anthology ID:: 2025.unlp-1.10
Volume:: Proceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria (online)
Editor:: Mariana Romanyshyn
Venues:: UNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 86–95
Language:
URL:: https://aclanthology.org/2025.unlp-1.10/
DOI:: 10.18653/v1/2025.unlp-1.10
Bibkey:
Cite (ACL):: Roman Kyslyi, Yuliia Maksymiuk, and Ihor Pysmennyi. 2025. Vuyko Mistral: Adapting LLMs for Low-Resource Dialectal Translation. In Proceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025), pages 86–95, Vienna, Austria (online). Association for Computational Linguistics.
Cite (Informal):: Vuyko Mistral: Adapting LLMs for Low-Resource Dialectal Translation (Kyslyi et al., UNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.unlp-1.10.pdf

PDF Cite Search Fix data