An Automatic Method to Estimate Correctness of RAG

Chi Zhang, Vivek V. Datla, Aditya Shrivastava, Alfy Samuel, Zhiqi Huang, Anoop Kumar, Daben Liu


Abstract
In sectors in where data quality is critical, like finance and healthcare, it is crucial to have confidence in not only the outputs generated by retrieval-augmented generation (RAG) models but also the process followed by the model while arriving at the output. Existing methods, such as hallucination detection and input-output entailment measurements, fail to capture the model’s internal state during answer generation. This paper introduces a novel approach to predict the correctness of the generated answer by modeling the model’s uncertainty on quantified perturbations of input. Extensive experiments across multiple large language models (LLMs) demonstrate that our approach quantifies RAG robustness by aligning predictions with ground truth with a Avg.Mean Square Error (MSE) 0.002 while offering flexibility for diverse qualitative metrics.
Anthology ID:
2025.coling-industry.52
Volume:
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert, Kareem Darwish, Apoorv Agarwal
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
603–611
Language:
URL:
https://aclanthology.org/2025.coling-industry.52/
DOI:
Bibkey:
Cite (ACL):
Chi Zhang, Vivek V. Datla, Aditya Shrivastava, Alfy Samuel, Zhiqi Huang, Anoop Kumar, and Daben Liu. 2025. An Automatic Method to Estimate Correctness of RAG. In Proceedings of the 31st International Conference on Computational Linguistics: Industry Track, pages 603–611, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
An Automatic Method to Estimate Correctness of RAG (Zhang et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-industry.52.pdf