Improving Domain-Specific Translation from English into Ukrainian with Retrieval-Augmented Generation

Anton Shpigunov


Abstract
Large language models have demonstrated competence as language translators, including for lower-resourced languages like Ukrainian. However, in specialized or novel domains, translation quality can suffer without adequate lexical and stylistic reference material. We present a retrieval-augmented approach to English-Ukrainian machine translation in a narrow domain: a private legal/military bilingual corpus. In this approach, semantically similar translation units retrieved via vector embeddings are provided as in-context examples to the LLM. We evaluate three open-weight Gemma 3 models, 4B, 12B, and 27B, against Gemini 3 Flash as a baseline across five augmentation conditions, with k values of 0, 3, 5, 10, and 25, on a 2,581-pair index and a 258-pair test set. We find that context augmentation yields statistically significant improvements in both ChrF++ and COMET for all models, with the smallest model’s COMET score improving by 0.076 at k = 3. However, smaller models exhibit context saturation: the 4B model’s performance peaks at k = 10 and degrades with additional context, losing 9.72 ChrF++ points and 0.007 COMET between k = 10 and k = 25, while larger models continue to benefit.
Anthology ID:
2026.unlp-1.1
Volume:
Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026)
Month:
May
Year:
2026
Address:
Lviv, Ukraine
Editor:
Mariana Romanyshyn
Venue:
UNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–11
Language:
URL:
https://aclanthology.org/2026.unlp-1.1/
DOI:
Bibkey:
Cite (ACL):
Anton Shpigunov. 2026. Improving Domain-Specific Translation from English into Ukrainian with Retrieval-Augmented Generation. In Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026), pages 1–11, Lviv, Ukraine. Association for Computational Linguistics.
Cite (Informal):
Improving Domain-Specific Translation from English into Ukrainian with Retrieval-Augmented Generation (Shpigunov, UNLP 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.unlp-1.1.pdf