Decomposed Prompting Does Not Fix Knowledge Gaps, But Helps Models Say “I Don’t Know”

Dhruv Madhwal; Lyuxin David Zhang; Dan Roth; Tomer Wolfson; Vivek Gupta

Decomposed Prompting Does Not Fix Knowledge Gaps, But Helps Models Say “I Don’t Know”

Dhruv Madhwal, Lyuxin David Zhang, Dan Roth, Tomer Wolfson, Vivek Gupta

Abstract

Large language models often struggle to recognize their knowledge limits in closed-book question answering, leading to confident hallucinations. While decomposed prompting is typically used to improve accuracy, we investigate its impact on reliability. We evaluate three task-equivalent prompting regimes: Direct, Assistive, and Incremental, across different model scales and multi-hop QA benchmarks. We find that although accuracy gains from decomposition diminish in frontier models, disagreements between prompting regimes remain highly indicative of potential errors. Because factual knowledge is typically stable while hallucinations are stochastic, cross-regime agreement provides a precise signal of internal uncertainty. We leverage this signal to implement a training-free abstention policy that requires no retrieval or fine-tuning. Our results show that disagreement-based abstention outperforms standard uncertainty baselines as an error detector, improving both F1 and AUROC across settings. This demonstrates that decomposition-based prompting can serve as a practical diagnostic probe for model reliability in closed-book QA.

Anthology ID:: 2026.findings-acl.1829
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 36688–36710
Language:
URL:: https://aclanthology.org/2026.findings-acl.1829/
DOI:
Bibkey:
Cite (ACL):: Dhruv Madhwal, Lyuxin David Zhang, Dan Roth, Tomer Wolfson, and Vivek Gupta. 2026. Decomposed Prompting Does Not Fix Knowledge Gaps, But Helps Models Say “I Don’t Know”. In Findings of the Association for Computational Linguistics: ACL 2026, pages 36688–36710, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Decomposed Prompting Does Not Fix Knowledge Gaps, But Helps Models Say “I Don’t Know” (Madhwal et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1829.pdf
Checklist:: 2026.findings-acl.1829.checklist.pdf

PDF Cite Search Checklist Fix data