What About the Scene With the Hitler Reference? HAUNT: A Framework to Probe LLMs’ Self-consistency in Closed Domains Via Adversarial Nudge

Arka Dutta; Sujan Dutta; Rijul Magu; Soumyajit Datta; Munmun De Choudhury; Ashiqur R. KhudaBukhsh

What About the Scene With the Hitler Reference? HAUNT: A Framework to Probe LLMs’ Self-consistency in Closed Domains Via Adversarial Nudge

Arka Dutta, Sujan Dutta, Rijul Magu, Soumyajit Datta, Munmun De Choudhury, Ashiqur R. KhudaBukhsh

Abstract

Hallucinations pose a critical challenge to the real-world deployment of large language models (LLMs) in high-stakes domains. In this paper, we present a framework for stress testing factual fidelity in LLMs in the presence of adversarial nudge. Our framework consists of three steps. First, we instruct the LLM to produce sets of truths and lies consistent with the closed domain in question. Next, we instruct the LLM to verify the same set of assertions as truths and lies consistent with the same closed domain. Finally, we test the robustness of the LLM against the lies generated (and verified) by itself. Our extensive evaluation, conducted using five widely known proprietary and six open LLMs across two closed domains of popular movies and novels, reveals a wide range of susceptibility to adversarial nudges: even among the strongest proprietary LLMs, Claude exhibits strong resilience, GPT and Grok demonstrate moderate resilience, while Gemini and DeepSeek show weak resilience and open models fall short significantly.

Anthology ID:: 2026.acl-long.2169
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 46769–46791
Language:
URL:: https://aclanthology.org/2026.acl-long.2169/
DOI:
Bibkey:
Cite (ACL):: Arka Dutta, Sujan Dutta, Rijul Magu, Soumyajit Datta, Munmun De Choudhury, and Ashiqur R. KhudaBukhsh. 2026. What About the Scene With the Hitler Reference? HAUNT: A Framework to Probe LLMs’ Self-consistency in Closed Domains Via Adversarial Nudge. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 46769–46791, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: What About the Scene With the Hitler Reference? HAUNT: A Framework to Probe LLMs’ Self-consistency in Closed Domains Via Adversarial Nudge (Dutta et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.2169.pdf
Checklist:: 2026.acl-long.2169.checklist.pdf

PDF Cite Search Checklist Fix data