Retrieval-Augmented Generation with Estimation of Source Reliability

Jeongyeon Hwang; Junyoung Park; Hyejin Park; Dongwoo Kim; Sangdon Park; Jungseul Ok

doi:10.18653/v1/2025.emnlp-main.1738

Retrieval-Augmented Generation with Estimation of Source Reliability

Jeongyeon Hwang, Junyoung Park, Hyejin Park, Dongwoo Kim, Sangdon Park, Jungseul Ok

Abstract

Retrieval-Augmented Generation (RAG) is an effective approach to enhance the factual accuracy of large language models (LLMs) by retrieving information from external databases, which are typically composed of diverse sources, to supplement the limited internal knowledge of LLMs. However, the standard RAG often risks retrieving incorrect information, as it relies solely on relevance between a query and a document, overlooking the heterogeneous reliability of these sources. To address this issue, we propose Reliability-Aware RAG (RA-RAG), a new multi-source RAG framework that estimates the reliability of sources and leverages this information to prioritize highly reliable and relevant documents, ensuring more robust and accurate response generation. Specifically, RA-RAG first estimates source reliability by cross-checking information across multiple sources. It then retrieves documents from the top-𝜅 reliable and relevant sources and aggregates their information using weighted majority voting (WMV), where the selective retrieval ensures scalability while not compromising the performance. Comprehensive experiments show that RA-RAG consistently outperforms baselines in scenarios with heterogeneous source reliability while scaling efficiently as the number of sources increases. Furthermore, we demonstrate the ability of RA-RAG to estimate real-world sources’ reliability, highlighting its practical applicability. Our code and data are available at RA-RAG.

Anthology ID:: 2025.emnlp-main.1738
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 34279–34303
Language:
URL:: https://aclanthology.org/2025.emnlp-main.1738/
DOI:: 10.18653/v1/2025.emnlp-main.1738
Bibkey:
Cite (ACL):: Jeongyeon Hwang, Junyoung Park, Hyejin Park, Dongwoo Kim, Sangdon Park, and Jungseul Ok. 2025. Retrieval-Augmented Generation with Estimation of Source Reliability. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 34279–34303, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Retrieval-Augmented Generation with Estimation of Source Reliability (Hwang et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.1738.pdf
Checklist:: 2025.emnlp-main.1738.checklist.pdf

PDF Cite Search Checklist Fix data