Daben Liu


2025

pdf bib
An Automatic Method to Estimate Correctness of RAG
Chi Zhang | Vivek V. Datla | Aditya Shrivastava | Alfy Samuel | Zhiqi Huang | Anoop Kumar | Daben Liu
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track

In sectors in where data quality is critical, like finance and healthcare, it is crucial to have confidence in not only the outputs generated by retrieval-augmented generation (RAG) models but also the process followed by the model while arriving at the output. Existing methods, such as hallucination detection and input-output entailment measurements, fail to capture the model’s internal state during answer generation. This paper introduces a novel approach to predict the correctness of the generated answer by modeling the model’s uncertainty on quantified perturbations of input. Extensive experiments across multiple large language models (LLMs) demonstrate that our approach quantifies RAG robustness by aligning predictions with ground truth with a Avg.Mean Square Error (MSE) 0.002 while offering flexibility for diverse qualitative metrics.