Ghost Context: Measuring Cross-Context Interference in Long-Context Language Models

Rohith Namboothiri

Ghost Context: Measuring Cross-Context Interference in Long-Context Language Models

Abstract

Long-context language models assemble prompts from heterogeneous sources, and deployed systems implicitly trust the model to use the correct span of context. We show that this assumption is often violated: irrelevant spans can silently shape outputs, producing errors that are neither fabrication nor omission but misattributed grounding—claims supported by the wrong part of the input context. Unlike intrinsic hallucination (contradicting the source) or extrinsic hallucination (introducing unsupported claims), misattributed grounding uses real evidence from an incorrect span, making it invisible to standard source-blind faithfulness metrics.We formalize this phenomenon as Ghost Context and introduce a causal mask-and-rerun attribution protocol to measure it. Across a 272-case corpus spanning multiple interference scenarios, we evaluate three widely used models and report two complementary signals: strict Ghost Context Rate (GCR), which captures verifiable factual misattribution, and open-ended influence, which captures broader contextual shaping effects. Under realistic contextual conflict, strict GCR spikes substantially: temporal contradictions trigger misattributed grounding in 38.3% of cases. Across all scenarios, open-ended distractor influence occurs in 20.4% of evaluations.Importantly, Ghost Context is not only detectable but also remediable. Masking the single highest-attributed distractor span resolves 95.5% of detected errors (Fix@1) with 2.4% collateral damage and zero false positives on negative controls. We also introduce Contextual Invariance Rate (CIR) as a system-level robustness metric measuring invariance to irrelevant context.Our findings show that contextual conflict—common in retrieval-augmented generation and agent systems—can systematically degrade reliability, but also reveal that Ghost Context errors are causally localizable and cheaply correctable. We release the evaluation corpus, detection pipeline, and experimental results to support further research on trustworthy long-context language model evaluation.

Anthology ID:: 2026.trustnlp-main.19
Volume:: Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026)
Month:: July
Year:: 2026
Address:: San Diego, California
Editors:: Kai-Wei Chang, Ninareh Mehrabi, Satyapriya Krishna, Anubrata Das, Jwala Dhamala, Yang Trista Cao, Tharindu Kumarage, Anil Ramakrishna, Christos Christodoulopoulos, Yixin Wan, Aram Galystan, Anoop Kumar, Rahul Gupta
Venues:: TrustNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 316–329
Language:
URL:: https://aclanthology.org/2026.trustnlp-main.19/
DOI:
Bibkey:
Cite (ACL):: Rohith Namboothiri. 2026. Ghost Context: Measuring Cross-Context Interference in Long-Context Language Models. In Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026), pages 316–329, San Diego, California. Association for Computational Linguistics.
Cite (Informal):: Ghost Context: Measuring Cross-Context Interference in Long-Context Language Models (Namboothiri, TrustNLP 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.trustnlp-main.19.pdf

PDF Cite Search Fix data