Jaap Jumelet


2024

pdf bib
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Anna Langedijk | Hosein Mohebbi | Gabriele Sarti | Willem Zuidema | Jaap Jumelet
Findings of the Association for Computational Linguistics: NAACL 2024

In recent years, several interpretability methods have been proposed to interpret the inner workings of Transformer models at different levels of precision and complexity.In this work, we propose a simple but effective technique to analyze encoder-decoder Transformers. Our method, which we name DecoderLens, allows the decoder to cross-attend representations of intermediate encoder activations instead of using the default final encoder output.The method thus maps uninterpretable intermediate vector representations to human-interpretable sequences of words or symbols, shedding new light on the information flow in this popular but understudied class of models.We apply DecoderLens to question answering, logical reasoning, speech recognition and machine translation models, finding that simpler subtasks are solved with high precision by low and intermediate encoder layers.

pdf bib
Do Language Models Exhibit Human-like Structural Priming Effects?
Jaap Jumelet | Willem Zuidema | Arabella Sinclair
Findings of the Association for Computational Linguistics: ACL 2024

We explore which linguistic factors—at the sentence and token level—play an important role in influencing language model predictions, and investigate whether these are reflective of results found in humans and human corpora (Gries and Kootstra, 2017). We make use of the structural priming paradigm—where recent exposure to a structure facilitates processing of the same structure—to investigate where priming effects manifest, and what factors predict them. We find these effects can be explained via the inverse frequency effect found in human priming, where rarer elements within a prime increase priming effects, as well as lexical dependence between prime and target. Our results provide an important piece in the puzzle of understanding how properties within their context affect structural prediction in language models.

pdf bib
Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
Yonatan Belinkov | Najoung Kim | Jaap Jumelet | Hosein Mohebbi | Aaron Mueller | Hanjie Chen
Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP

pdf bib
Transformer-specific Interpretability
Hosein Mohebbi | Jaap Jumelet | Michael Hanna | Afra Alishahi | Willem Zuidema
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts

Transformers have emerged as dominant play- ers in various scientific fields, especially NLP. However, their inner workings, like many other neural networks, remain opaque. In spite of the widespread use of model-agnostic interpretability techniques, including gradient-based and occlusion-based, their shortcomings are becoming increasingly apparent for Transformer interpretation, making the field of interpretability more demanding today. In this tutorial, we will present Transformer-specific interpretability methods, a new trending approach, that make use of specific features of the Transformer architecture and are deemed more promising for understanding Transformer-based models. We start by discussing the potential pitfalls and misleading results model-agnostic approaches may produce when interpreting Transformers. Next, we discuss Transformer-specific methods, including those designed to quantify context- mixing interactions among all input pairs (as the fundamental property of the Transformer architecture) and those that combine causal methods with low-level Transformer analysis to identify particular subnetworks within a model that are responsible for specific tasks. By the end of the tutorial, we hope participants will understand the advantages (as well as current limitations) of Transformer-specific interpretability methods, along with how these can be applied to their own research.

pdf bib
Interpretability of Language Models via Task Spaces
Lucas Weber | Jaap Jumelet | Elia Bruni | Dieuwke Hupkes
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The usual way to interpret language models (LMs) is to test their performance on different benchmarks and subsequently infer their internal processes.In this paper, we present an alternative approach, concentrating on the _quality_ of LM processing, with a focus on their language abilities.To this end, we construct ‘linguistic task spaces’ – representations of an LM’s language conceptualisation – that shed light on the connections LMs draw between language phenomena.Task spaces are based on the interactions of the learning signals from different linguistic phenomena, which we assess via a method we call ‘similarity probing’.To disentangle the learning signals of linguistic phenomena, we further introduce a method called ‘fine-tuning via gradient differentials’ (FTGD).We apply our methods to language models of three different scales and find that larger models generalise better to overarching general concepts for linguistic tasks, making better use of their shared structure. Further, the distributedness of linguistic processing increases with pre-training through increased parameter sharing between related linguistic tasks. The overall generalisation patterns are mostly stable throughout training and not marked by incisive stages, potentially explaining the lack of successful curriculum strategies for LMs.

2023

pdf bib
Attribution and Alignment: Effects of Local Context Repetition on Utterance Production and Comprehension in Dialogue
Aron Molnar | Jaap Jumelet | Mario Giulianelli | Arabella Sinclair
Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL)

Language models are often used as the backbone of modern dialogue systems. These models are pre-trained on large amounts of written fluent language. Repetition is typically penalised when evaluating language model generations. However, it is a key component of dialogue. Humans use local and partner specific repetitions; these are preferred by human users and lead to more successful communication in dialogue. In this study, we evaluate (a) whether language models produce human-like levels of repetition in dialogue, and (b) what are the processing mechanisms related to lexical re-use they use during comprehension. We believe that such joint analysis of model production and comprehension behaviour can inform the development of cognitively inspired dialogue generation systems.

pdf bib
ChapGTP, ILLC’s Attempt at Raising a BabyLM: Improving Data Efficiency by Automatic Task Formation
Jaap Jumelet | Michael Hanna | Marianne de Heer Kloots | Anna Langedijk | Charlotte Pouw | Oskar van der Wal
Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning

pdf bib
Feature Interactions Reveal Linguistic Structure in Language Models
Jaap Jumelet | Willem Zuidema
Findings of the Association for Computational Linguistics: ACL 2023

We study feature interactions in the context of feature attribution methods for post-hoc interpretability. In interpretability research, getting to grips with feature interactions is increasingly recognised as an important challenge, because interacting features are key to the success of neural networks. Feature interactions allow a model to build up hierarchical representations for its input, and might provide an ideal starting point for the investigation into linguistic structure in language models. However, uncovering the exact role that these interactions play is also difficult, and a diverse range of interaction attribution methods has been proposed. In this paper, we focus on the question which of these methods most faithfully reflects the inner workings of the target models. We work out a grey box methodology, in which we train models to perfection on a formal language classification task, using PCFGs. We show that under specific configurations, some methods are indeed able to uncover the grammatical rules acquired by a model. Based on these findings we extend our evaluation to a case study on language models, providing novel insights into the linguistic structure that these models have acquired.

pdf bib
Transparency at the Source: Evaluating and Interpreting Language Models With Access to the True Distribution
Jaap Jumelet | Willem Zuidema
Findings of the Association for Computational Linguistics: EMNLP 2023

We present a setup for training, evaluating and interpreting neural language models, that uses artificial, language-like data. The data is generated using a massive probabilistic grammar (based on state-split PCFGs), that is itself derived from a large natural language corpus, but also provides us complete control over the generative process. We describe and release both grammar and corpus, and test for the naturalness of our generated data. This approach allows us define closed-form expressions to efficiently compute exact lower bounds on obtainable perplexity using both causal and masked language modelling. Our results show striking differences between neural language modelling architectures and training objectives in how closely they allow approximating the lower bound on perplexity. Our approach also allows us to directly compare learned representations to symbolic rules in the underlying source. We experiment with various techniques for interpreting model behaviour and learning dynamics. With access to the underlying true source, our results show striking differences and outcomes in learning dynamics between different classes of words.

pdf bib
Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
Yonatan Belinkov | Sophie Hao | Jaap Jumelet | Najoung Kim | Arya McCarthy | Hosein Mohebbi
Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP

2022

pdf bib
The Birth of Bias: A case study on the evolution of gender bias in an English language model
Oskar Van Der Wal | Jaap Jumelet | Katrin Schulz | Willem Zuidema
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

Detecting and mitigating harmful biases in modern language models are widely recognized as crucial, open problems. In this paper, we take a step back and investigate how language models come to be biased in the first place. We use a relatively small language model, using the LSTM architecture trained on an English Wikipedia corpus. With full access to the data and to the model parameters as they change during every step while training, we can map in detail how the representation of gender develops, what patterns in the dataset drive this, and how the model’s internal state relates to the bias in a downstream task (semantic textual similarity).We find that the representation of gender is dynamic and identify different phases during training. Furthermore, we show that gender information is represented increasingly locally in the input embeddings of the model and that, as a consequence, debiasing these can be effective in reducing the downstream bias. Monitoring the training dynamics, allows us to detect an asymmetry in how the female and male gender are represented in the input embeddings. This is important, as it may cause naive mitigation strategies to introduce new undesirable biases. We discuss the relevance of the findings for mitigation strategies more generally and the prospects of generalizing our methods to larger language models, the Transformer architecture, other languages and other undesirable biases.

pdf bib
Structural Persistence in Language Models: Priming as a Window into Abstract Language Representations
Arabella Sinclair | Jaap Jumelet | Willem Zuidema | Raquel Fernández
Transactions of the Association for Computational Linguistics, Volume 10

We investigate the extent to which modern neural language models are susceptible to structural priming, the phenomenon whereby the structure of a sentence makes the same structure more probable in a follow-up sentence. We explore how priming can be used to study the potential of these models to learn abstract structural information, which is a prerequisite for good performance on tasks that require natural language understanding skills. We introduce a novel metric and release Prime-LM, a large corpus where we control for various linguistic factors that interact with priming strength. We find that Transformer models indeed show evidence of structural priming, but also that the generalizations they learned are to some extent modulated by semantic information. Our experiments also show that the representations acquired by the models may not only encode abstract sequential structure but involve certain level of hierarchical syntactic information. More generally, our study shows that the priming paradigm is a useful, additional tool for gaining insights into the capacities of language models and opens the door to future priming-based investigations that probe the model’s internal states.1

2021

pdf bib
Attention vs non-attention for a Shapley-based explanation method
Tom Kersten | Hugh Mee Wong | Jaap Jumelet | Dieuwke Hupkes
Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

The field of explainable AI has recently seen an explosion in the number of explanation methods for highly non-linear deep neural networks. The extent to which such methods – that are often proposed and tested in the domain of computer vision – are appropriate to address the explainability challenges in NLP is yet relatively unexplored. In this work, we consider Contextual Decomposition (CD) – a Shapley-based input feature attribution method that has been shown to work well for recurrent NLP models – and we test the extent to which it is useful for models that contain attention operations. To this end, we extend CD to cover the operations necessary for attention-based models. We then compare how long distance subject-verb relationships are processed by models with and without attention, considering a number of different syntactic structures in two different languages: English and Dutch. Our experiments confirm that CD can successfully be applied for attention-based models as well, providing an alternative Shapley-based attribution method for modern neural networks. In particular, using CD, we show that the English and Dutch models demonstrate similar processing behaviour, but that under the hood there are consistent differences between our attention and non-attention models.

pdf bib
Language Modelling as a Multi-Task Problem
Lucas Weber | Jaap Jumelet | Elia Bruni | Dieuwke Hupkes
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

In this paper, we propose to study language modelling as a multi-task problem, bringing together three strands of research: multi-task learning, linguistics, and interpretability. Based on hypotheses derived from linguistic theory, we investigate whether language models adhere to learning principles of multi-task learning during training. To showcase the idea, we analyse the generalisation behaviour of language models as they learn the linguistic concept of Negative Polarity Items (NPIs). Our experiments demonstrate that a multi-task setting naturally emerges within the objective of the more general task of language modelling. We argue that this insight is valuable for multi-task learning, linguistics and interpretability research and can lead to exciting new findings in all three domains.

pdf bib
Language Models Use Monotonicity to Assess NPI Licensing
Jaap Jumelet | Milica Denic | Jakub Szymanik | Dieuwke Hupkes | Shane Steinert-Threlkeld
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
diagNNose: A Library for Neural Activation Analysis
Jaap Jumelet
Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

In this paper we introduce diagNNose, an open source library for analysing the activations of deep neural networks. diagNNose contains a wide array of interpretability techniques that provide fundamental insights into the inner workings of neural networks. We demonstrate the functionality of diagNNose with a case study on subject-verb agreement within language models. diagNNose is available at https://github.com/i-machine-think/diagnnose.

2019

pdf bib
Analysing Neural Language Models: Contextual Decomposition Reveals Default Reasoning in Number and Gender Assignment
Jaap Jumelet | Willem Zuidema | Dieuwke Hupkes
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Extensive research has recently shown that recurrent neural language models are able to process a wide range of grammatical phenomena. How these models are able to perform these remarkable feats so well, however, is still an open question. To gain more insight into what information LSTMs base their decisions on, we propose a generalisation of Contextual Decomposition (GCD). In particular, this setup enables us to accurately distil which part of a prediction stems from semantic heuristics, which part truly emanates from syntactic cues and which part arise from the model biases themselves instead. We investigate this technique on tasks pertaining to syntactic agreement and co-reference resolution and discover that the model strongly relies on a default reasoning effect to perform these tasks.

2018

pdf bib
Do Language Models Understand Anything? On the Ability of LSTMs to Understand Negative Polarity Items
Jaap Jumelet | Dieuwke Hupkes
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

In this paper, we attempt to link the inner workings of a neural language model to linguistic theory, focusing on a complex phenomenon well discussed in formal linguistics: (negative) polarity items. We briefly discuss the leading hypotheses about the licensing contexts that allow negative polarity items and evaluate to what extent a neural language model has the ability to correctly process a subset of such constructions. We show that the model finds a relation between the licensing context and the negative polarity item and appears to be aware of the scope of this context, which we extract from a parse tree of the sentence. With this research, we hope to pave the way for other studies linking formal linguistics to deep learning.