Nandi Schoots
2026
Punctuations and Predicates in Language Models
Sonakshi Chauhan | Maheep Chaudhary | Choy Kwan Kiu | Samuel Nellessen | Nandi Schoots
Findings of the Association for Computational Linguistics: EACL 2026
Sonakshi Chauhan | Maheep Chaudhary | Choy Kwan Kiu | Samuel Nellessen | Nandi Schoots
Findings of the Association for Computational Linguistics: EACL 2026
In this paper we explore where information is collected and how it is propagated throughout layers in large language models (LLMs). We begin by examining the surprising computational importance of punctuation tokens which previous work has identified as attention sinks and memory aids. Using intervention-based techniques, we evaluate the necessity and sufficiency of punctuation tokens across layers in GPT-2, DeepSeek, and Gemma. Our results show stark model-specific differences: for GPT-2, punctuation is both necessary and sufficient in multiple layers, while this holds far less in DeepSeek and not at all in Gemma. Extending beyond punctuation, we ask whether LLMs process different components of input (e.g., subjects, adjectives, punctuation, full sentences) by forming early static summaries reused across the network, or if the model remains sensitive to changes in these components across layers. We investigate whether different reasoning rules are processed differently by LLMs. In particular, through interchange intervention and layer-swapping experiments, we find that conditional statements (if, then), and universal quantification (for all) are processed very differently. Our findings offer new insight into the internal mechanisms of punctuation usage and reasoning in LLMs and have implications for interpretability and model analysis.
2025
Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Yohan Mathew | Ollie Matthews | Robert McCarthy | Joan Velja | Christian Schroeder de Witt | Dylan Cope | Nandi Schoots
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Yohan Mathew | Ollie Matthews | Robert McCarthy | Joan Velja | Christian Schroeder de Witt | Dylan Cope | Nandi Schoots
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
The rapid proliferation of frontier model agents promises significant societal advances but also raises concerns about systemic risks arising from unsafe interactions.Collusion to the disadvantage of others has been identified as a central form of undesirable agent cooperation.The use of information hiding (steganography) in agent communications could render such collusion practically undetectable.This underscores the need for investigations into the possibility of such behaviours emerging and the robustness corresponding countermeasures.To investigate this problem we design two approaches – a gradient-based reinforcement learning (GBRL) method and an in-context reinforcement learning (ICRL) method – for reliably eliciting sophisticated LLM-generated linguistic text steganography.We demonstrate, for the first time, that unintended steganographic collusion in LLMs can arise due to mispecified reward incentives during training.Additionally, we find that standard mitigations — both passive oversight of model outputs and active mitigation through communication paraphrasing — are not fully effective at preventing this steganographic communication.Our findings imply that (i) emergence of steganographic collusion is a plausible concern that should be monitored and researched, and (ii) preventing emergence may require innovation in mitigation techniques.