Anton Shpigunov
2026
Improving Domain-Specific Translation from English into Ukrainian with Retrieval-Augmented Generation
Anton Shpigunov
Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026)
Anton Shpigunov
Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026)
Large language models have demonstrated competence as language translators, including for lower-resourced languages like Ukrainian. However, in specialized or novel domains, translation quality can suffer without adequate lexical and stylistic reference material. We present a retrieval-augmented approach to English-Ukrainian machine translation in a narrow domain: a private legal/military bilingual corpus. In this approach, semantically similar translation units retrieved via vector embeddings are provided as in-context examples to the LLM. We evaluate three open-weight Gemma 3 models, 4B, 12B, and 27B, against Gemini 3 Flash as a baseline across five augmentation conditions, with k values of 0, 3, 5, 10, and 25, on a 2,581-pair index and a 258-pair test set. We find that context augmentation yields statistically significant improvements in both ChrF++ and COMET for all models, with the smallest model’s COMET score improving by 0.076 at k = 3. However, smaller models exhibit context saturation: the 4B model’s performance peaks at k = 10 and degrades with additional context, losing 9.72 ChrF++ points and 0.007 COMET between k = 10 and k = 25, while larger models continue to benefit.