Investigating the Robustness of Retrieval-Augmented Generation at the Query Level

Sezen Perçin; Xin Su; Qutub Sha Syed; Phillip Howard; Aleksei Kuvshinov; Leo Schwinn; Kay-Ulrich Scholl

Investigating the Robustness of Retrieval-Augmented Generation at the Query Level

Sezen Perçin, Xin Su, Qutub Sha Syed, Phillip Howard, Aleksei Kuvshinov, Leo Schwinn, Kay-Ulrich Scholl

Abstract

Large language models (LLMs) are very costly and inefficient to update with new information. To address this limitation, retrieval-augmented generation (RAG) has been proposed as a solution that dynamically incorporates external knowledge during inference, improving factual consistency and reducing hallucinations. Despite its promise, RAG systems face practical challenges-most notably, a strong dependence on the quality of the input query for accurate retrieval. In this paper, we investigate the sensitivity of different components in the RAG pipeline to various types of query perturbations. Our analysis reveals that the performance of commonly used retrievers can degrade significantly even under minor query variations. We study each module in isolation as well as their combined effect in an end-to-end question answering setting, using both general-domain and domain-specific datasets. Additionally, we propose an evaluation framework to systematically assess the query-level robustness of RAG pipelines and offer actionable recommendations for practitioners based on the results of more than 1092 experiments we performed.

Anthology ID:: 2025.gem-1.38
Volume:: Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)
Month:: July
Year:: 2025
Address:: Vienna, Austria and virtual meeting
Editors:: Ofir Arviv, Miruna Clinciu, Kaustubh Dhole, Rotem Dror, Sebastian Gehrmann, Eliya Habba, Itay Itzhak, Simon Mille, Yotam Perlitz, Enrico Santus, João Sedoc, Michal Shmueli Scheuer, Gabriel Stanovsky, Oyvind Tafjord
Venues:: GEM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 439–457
Language:
URL:: https://aclanthology.org/2025.gem-1.38/
DOI:
Bibkey:
Cite (ACL):: Sezen Perçin, Xin Su, Qutub Sha Syed, Phillip Howard, Aleksei Kuvshinov, Leo Schwinn, and Kay-Ulrich Scholl. 2025. Investigating the Robustness of Retrieval-Augmented Generation at the Query Level. In Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²), pages 439–457, Vienna, Austria and virtual meeting. Association for Computational Linguistics.
Cite (Informal):: Investigating the Robustness of Retrieval-Augmented Generation at the Query Level (Perçin et al., GEM 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.gem-1.38.pdf

PDF Cite Search Fix data