Isolating LLM Performance Gains in Pre-training versus Instruction-tuning for Mid-resource Languages: The Ukrainian Benchmark Study

Yurii Paniv


Abstract
This paper evaluates language model performance on Ukrainian language tasks across multiple downstream benchmarks, including summarization, closed and open question answering, and translation at both sentence and paragraph levels. We also introduce LongFlores, an extension of the FLORES benchmark designed specifically to assess paragraph-level translation capabilities. In our experiments, we compare the performance of base models against their instruction-tuned counterparts to isolate and quantify the source of performance improvements for Ukrainian language tasks. Our findings reveal that for popular open source models, base models are stronger in the few-shot setting for the task than their instruction-tuned counterparts in the zero-shot setting. This suggests lower attention paid to Ukrainian during the instruction-tuning phase, providing valuable insights for future model development and optimization for Ukrainian and potentially other lower-resourced languages.
Anthology ID:
2025.ranlp-1.100
Volume:
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
876–883
Language:
URL:
https://aclanthology.org/2025.ranlp-1.100/
DOI:
Bibkey:
Cite (ACL):
Yurii Paniv. 2025. Isolating LLM Performance Gains in Pre-training versus Instruction-tuning for Mid-resource Languages: The Ukrainian Benchmark Study. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 876–883, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Isolating LLM Performance Gains in Pre-training versus Instruction-tuning for Mid-resource Languages: The Ukrainian Benchmark Study (Paniv, RANLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ranlp-1.100.pdf