Enhancing Retrieval-Augmented Generation: A Study of Best Practices

Siran Li; Linus Stenzel; Carsten Eickhoff; Seyed Ali Bahrainian

Enhancing Retrieval-Augmented Generation: A Study of Best Practices

Siran Li, Linus Stenzel, Carsten Eickhoff, Seyed Ali Bahrainian

Abstract

Retrieval-Augmented Generation (RAG) systems have recently shown remarkable advancements by integrating retrieval mechanisms into language models, enhancing their ability to produce more accurate and contextually relevant responses. However, the influence of various components and configurations within RAG systems remains underexplored. A comprehensive understanding of these elements is essential for tailoring RAG systems to complex retrieval tasks and ensuring optimal performance across diverse applications. In this paper, we develop several advanced RAG system designs that incorporate query expansion, various novel retrieval strategies, and a novel Contrastive In-Context Learning RAG. Our study systematically investigates key factors, including language model size, prompt design, document chunk size, knowledge base size, retrieval stride, query expansion techniques, Contrastive In-Context Learning knowledge bases, multilingual knowledge bases, and Focus Mode retrieving relevant context at sentence-level. Through extensive experimentation, we provide a detailed analysis of how these factors influence response quality. Our findings offer actionable insights for developing RAG systems, striking a balance between contextual richness and retrieval-generation efficiency, thereby paving the way for more adaptable and high-performing RAG frameworks in diverse real-world scenarios. Our code and implementation details are publicly available.

Anthology ID:: 2025.coling-main.449
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6705–6717
Language:
URL:: https://aclanthology.org/2025.coling-main.449/
DOI:
Bibkey:
Cite (ACL):: Siran Li, Linus Stenzel, Carsten Eickhoff, and Seyed Ali Bahrainian. 2025. Enhancing Retrieval-Augmented Generation: A Study of Best Practices. In Proceedings of the 31st International Conference on Computational Linguistics, pages 6705–6717, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Enhancing Retrieval-Augmented Generation: A Study of Best Practices (Li et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.449.pdf

PDF Cite Search Fix data