Beyond Black-Box Interventions: Latent Probing for Faithful Retrieval-Augmented Generation

Linfeng Gao; Qinggang Zhang; Baolong Bi; Bo Zeng; Zheng Yuan; Zerui Chen; Zhimin Wei; Shenghua Liu; Linlong Xu; Longyue Wang; Weihua Luo; Jinsong Su

Beyond Black-Box Interventions: Latent Probing for Faithful Retrieval-Augmented Generation

Linfeng Gao, Qinggang Zhang, Baolong Bi, Bo Zeng, Zheng Yuan, Zerui Chen, Zhimin Wei, Shenghua Liu, Linlong Xu, Longyue Wang, Weihua Luo, Jinsong Su

Abstract

Retrieval-Augmented Generation (RAG) systems often fail to maintain contextual faithfulness, generating responses that conflict with the provided context. Existing methods attempt to improve faithfulness through external interventions, such as specialized prompting, decoding-based calibration, or preference optimization. However, since these approaches treat the LLM as a black box, they lack a reliable mechanism to assess how these conflicts occur. Consequently, they tend to be brittle, data-intensive, and agnostic to the model’s internal reasoning process. In this paper, we move beyond black-box interventions to analyze the model’s internal reasoning process. We discover that conflicting and aligned knowledge states are linearly separable in the model’s latent space, and contextual noise systematically increases the entropy of these representations. Based on these findings, we propose ProbeRAG, a novel framework for faithful RAG that operates in three stages: (i) fine-grained knowledge pruning to filter irrelevant context, (ii) latent conflict probing to identify hard conflicts in the model’s latent space, and (iii) conflict-aware attention to modulate attention heads toward faithful context integration. Extensive experiments demonstrate that ProbeRAG substantially improves both accuracy and contextual faithfulness. The related resources are available at https://github.com/XMUDeepLIT/ProbeRAG.

Anthology ID:: 2026.findings-acl.1499
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 29981–30000
Language:
URL:: https://aclanthology.org/2026.findings-acl.1499/
DOI:
Bibkey:
Cite (ACL):: Linfeng Gao, Qinggang Zhang, Baolong Bi, Bo Zeng, Zheng Yuan, Zerui Chen, Zhimin Wei, Shenghua Liu, Linlong Xu, Longyue Wang, Weihua Luo, and Jinsong Su. 2026. Beyond Black-Box Interventions: Latent Probing for Faithful Retrieval-Augmented Generation. In Findings of the Association for Computational Linguistics: ACL 2026, pages 29981–30000, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Beyond Black-Box Interventions: Latent Probing for Faithful Retrieval-Augmented Generation (Gao et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1499.pdf
Checklist:: 2026.findings-acl.1499.checklist.pdf

PDF Cite Search Checklist Fix data