Do LLMs Behave as Claimed? Investigating How LLMs Follow Their Own Claims using Counterfactual Questions

Haochen Shi; Shaobo Li; Guoqing Chao; Xiaoliang Shi; Wentao Chen; Zhenzhou Ji

doi:10.18653/v1/2025.emnlp-main.1479

Do LLMs Behave as Claimed? Investigating How LLMs Follow Their Own Claims using Counterfactual Questions

Haochen Shi, Shaobo Li, Guoqing Chao, Xiaoliang Shi, Wentao Chen, Zhenzhou Ji

Abstract

Large Language Models (LLMs) require robust evaluation. However, existing frameworks often rely on curated datasets that, once public, may be accessed by newer LLMs. This creates a risk of data leakage, where test sets inadvertently become part of training data, compromising evaluation fairness and integrity. To mitigate this issue, we propose Behave as Claimed (BaC), a novel evaluation framework inspired by counterfactual reasoning. BaC constructs a “what-if” scenario where LLMs respond to counterfactual questions about how they would behave if the input were manipulated. We refer to these responses as claims, which are verifiable by observing the LLMs’ actual behavior when given the manipulated input. BaC dynamically generates and verifies counterfactual questions using various few-shot in-context learning evaluation datasets, reducing their susceptibility to data leakage. Moreover, BaC provides a more challenging evaluation paradigm for LLMs. LLMs must thoroughly understand the prompt, the task, and the consequences of their responses to achieve better performance. We evaluate several state-of-the-art LLMs and find that, while most perform well on the original datasets, they struggle with BaC. This suggests that LLMs usually fail to align their claims with their actual behavior and that high performance on standard datasets may be less stable than previously assumed.

Anthology ID:: 2025.emnlp-main.1479
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 29043–29056
Language:
URL:: https://aclanthology.org/2025.emnlp-main.1479/
DOI:: 10.18653/v1/2025.emnlp-main.1479
Bibkey:
Cite (ACL):: Haochen Shi, Shaobo Li, Guoqing Chao, Xiaoliang Shi, Wentao Chen, and Zhenzhou Ji. 2025. Do LLMs Behave as Claimed? Investigating How LLMs Follow Their Own Claims using Counterfactual Questions. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 29043–29056, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Do LLMs Behave as Claimed? Investigating How LLMs Follow Their Own Claims using Counterfactual Questions (Shi et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.1479.pdf
Checklist:: 2025.emnlp-main.1479.checklist.pdf

PDF Cite Search Checklist Fix data