Stefan Arnold

2025

Steering Prepositional Phrases in Language Models: A Case of with-headed Adjectival and Adverbial Complements in Gemma-2
Stefan Arnold | Rene Gröbner
Proceedings of the 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP

Language Models, when generating prepositional phrases, must often decide for whether their complements functions as an instrumental adjunct (describing the verb adverbially) or an attributive modifier (enriching the noun adjectivally), yet the internal mechanisms that resolve this split decision remain poorly understood. In this study, we conduct a targeted investigation into Gemma-2 to uncover and control the generation of prepositional complements. We assemble a prompt suite containing with-headed prepositional phrases whose contexts equally accommodate either an instrumental or attributive continuation, revealing a strong preference for an instrumental reading at a ratio of 3:4. To pinpoint individual attention heads that favor instrumental over attributive complements, we project activations into the vocabulary space. By scaling the value vector of a single attention head, we can shift the distribution of functional roles of complements, attenuating instruments to 33% while elevating attributes to 36%.

pdf bib abs

Memorization in Language Models through the Lens of Intrinsic Dimension
Stefan Arnold
Proceedings of the First Workshop on Large Language Model Memorization (L2M2)

Language Models (LMs) are prone to memorizing parts of their data during training and unintentionally emitting them at generation time, raising concerns about privacy leakage and disclosure of intellectual property. While previous research has identified properties such as context length, parameter size, and duplication frequency, as key drivers of unintended memorization, little is known about how the latent structure modulates this rate of memorization. We investigate the role of Intrinsic Dimension (ID), a geometric proxy for the structural complexity of a sequence in latent space, in modulating memorization. Our findings suggest that ID acts as a suppressive signal for memorization: compared to low-ID sequences, high-ID sequences are less likely to be memorized, particularly in overparameterized models and under sparse exposure. These findings highlight the interaction between scale, exposure, and complexity in shaping memorization.

pdf bib abs

Inspecting the Representation Manifold of Differentially-Private Text
Stefan Arnold
Proceedings of the Sixth Workshop on Privacy in Natural Language Processing

Differential Privacy (DP) for text has recently taken the form of text paraphrasing using language models and temperature sampling to better balance privacy and utility. However, the geometric distortion of DP regarding the structure and complexity in the representation space remains unexplored. By estimating the intrinsic dimension of paraphrased text across varying privacy budgets, we find that word-level methods severely raise the representation manifold, while sentence-level methods produce paraphrases whose manifolds are topologically more consistent with human-written paraphrases. Among sentence-level methods, masked paraphrasing, compared to causal paraphrasing, demonstrates superior preservation of structural complexity, suggesting that autoregressive generation propagates distortions from unnatural word choices that cascade and inflate the representation space.

2024

pdf bib abs

Routing in Sparsely-gated Language Models responds to Context
Stefan Arnold | Marian Fietta | Dilara Yesilbas
Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP

Language Models (LMs) recently incorporate mixture-of-experts layers consisting of a router and a collection of experts to scale up their parameter count given a fixed computational budget. Building on previous efforts indicating that token-expert assignments are predominantly influenced by token identities and positions, we trace routing decisions of similarity-annotated text pairs to evaluate the context sensitivity of learned token-expert assignments. We observe that routing in encoder layers mainly depends on (semantic) associations, but contextual cues provide an additional layer of refinement. Conversely, routing in decoder layers is more variable and markedly less sensitive to context.

pdf bib abs

Characterizing Stereotypical Bias from Privacy-preserving Pre-Training
Stefan Arnold | Rene Gröbner | Annika Schreiner
Proceedings of the Fifth Workshop on Privacy in Natural Language Processing

Differential Privacy (DP) can be applied to raw text by exploiting the spatial arrangement of words in an embedding space. We investigate the implications of such text privatization on Language Models (LMs) and their tendency towards stereotypical associations. Since previous studies documented that linguistic proficiency correlates with stereotypical bias, one could assume that techniques for text privatization, which are known to degrade language modeling capabilities, would cancel out undesirable biases. By testing BERT models trained on texts containing biased statements primed with varying degrees of privacy, our study reveals that while stereotypical bias generally diminishes when privacy is tightened, text privatization does not uniformly equate to diminishing bias across all social domains. This highlights the need for careful diagnosis of bias in LMs that undergo text privatization.

2023

pdf bib abs

Disentangling the Linguistic Competence of Privacy-Preserving BERT
Stefan Arnold | Nils Kemmerzell | Annika Schreiner
Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP

Differential Privacy (DP) has been tailored to address the unique challenges of text-to-text privatization. However, text-to-text privatization is known for degrading the performance of language models when trained on perturbed text. Employing a series of interpretation techniques on the internal representations extracted from BERT trained on perturbed pre-text, we intend to disentangle at the linguistic level the distortion induced by differential privacy. Experimental results from a representational similarity analysis indicate that the overall similarity of internal representations is substantially reduced. Using probing tasks to unpack this dissimilarity, we find evidence that text-to-text privatization affects the linguistic competence across several formalisms, encoding localized properties of words while falling short at encoding the contextual relationships between spans of words.

pdf bib abs

Driving Context into Text-to-Text Privatization
Stefan Arnold | Dilara Yesilbas | Sven Weinzierl
Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023)

Metric Differential Privacy enables text-to-text privatization by adding calibrated noise to the vector of a word derived from an embedding space and projecting this noisy vector back to a discrete vocabulary using a nearest neighbor search. Since words are substituted without context, this mechanism is expected to fall short at finding substitutes for words with ambiguous meanings, such as ‘bank’. To account for these ambiguous words, we leverage a sense embedding and incorporate a sense disambiguation step prior to noise injection. We encompass our modification to the privatization mechanism with an estimation of privacy and utility. For word sense disambiguation on the Words in Context dataset, we demonstrate a substantial increase in classification accuracy by 6.05%.

pdf bib abs

Guiding Text-to-Text Privatization by Syntax
Stefan Arnold | Dilara Yesilbas | Sven Weinzierl
Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023)

Metric Differential Privacy is a generalization of differential privacy tailored to address the unique challenges of text-to-text privatization. By adding noise to the representation of words in the geometric space of embeddings, words are replaced with words located in the proximity of the noisy representation. Since embeddings are trained based on word co-occurrences, this mechanism ensures that substitutions stem from a common semantic context. Without considering the grammatical category of words, however, this mechanism cannot guarantee that substitutions play similar syntactic roles. We analyze the capability of text-to-text privatization to preserve the grammatical category of words after substitution and find that surrogate texts consist almost exclusively of nouns. Lacking the capability to produce surrogate texts that correlate with the structure of the sensitive texts, we encompass our analysis by transforming the privatization step into a candidate selection problem in which substitutions are directed to words with matching grammatical properties. We demonstrate a substantial improvement in the performance of downstream tasks by up to 4.66% while retaining comparative privacy guarantees.

Co-authors

Nils Kemmerzell 1

Venues

Fix author