RAG Pipeline Strategies for Ukrainian Multi-Domain Document Understanding Task

Mykola Nosenko, Pavlo Kilko


Abstract
In this work, we present top-performing solution to the UNLP 2026 Shared Task on Ukrainian Multi-Domain Document Understanding. This task focuses on answering multiple-choice questions grounded in domain-specific Ukrainian documents, while also requiring systems to identify the source document and page. We developed a modular retrieval-augmented generation (RAG) pipeline and conducted a series of ablation experiments over its individual components to identify the best-performing strategy at each stage. Based on our evaluation results, we propose two final pipeline configurations that differ in their computational cost and retrieval accuracy: a stronger but more compute-intensive document-level augmentation approach and a lighter summary-based augmentation that is suitable for constrained environments. Our submission achieved 3rd place on the private leaderboard. This demonstrates that isolated curation of RAG components can yield strong performance for Ukrainian document grounded question answering without additional language model adaptations.
Anthology ID:
2026.unlp-1.21
Volume:
Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026)
Month:
May
Year:
2026
Address:
Lviv, Ukraine
Editor:
Mariana Romanyshyn
Venue:
UNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
240–248
Language:
URL:
https://aclanthology.org/2026.unlp-1.21/
DOI:
Bibkey:
Cite (ACL):
Mykola Nosenko and Pavlo Kilko. 2026. RAG Pipeline Strategies for Ukrainian Multi-Domain Document Understanding Task. In Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026), pages 240–248, Lviv, Ukraine. Association for Computational Linguistics.
Cite (Informal):
RAG Pipeline Strategies for Ukrainian Multi-Domain Document Understanding Task (Nosenko & Kilko, UNLP 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.unlp-1.21.pdf