Jaap Jumelet


2022

pdf bib
The Birth of Bias: A case study on the evolution of gender bias in an English language model
Oskar Van Der Wal | Jaap Jumelet | Katrin Schulz | Willem Zuidema
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

Detecting and mitigating harmful biases in modern language models are widely recognized as crucial, open problems. In this paper, we take a step back and investigate how language models come to be biased in the first place.We use a relatively small language model, using the LSTM architecture trained on an English Wikipedia corpus. With full access to the data and to the model parameters as they change during every step while training, we can map in detail how the representation of gender develops, what patterns in the dataset drive this, and how the model’s internal state relates to the bias in a downstream task (semantic textual similarity).We find that the representation of gender is dynamic and identify different phases during training.Furthermore, we show that gender information is represented increasingly locally in the input embeddings of the model and that, as a consequence, debiasing these can be effective in reducing the downstream bias.Monitoring the training dynamics, allows us to detect an asymmetry in how the female and male gender are represented in the input embeddings. This is important, as it may cause naive mitigation strategies to introduce new undesirable biases.We discuss the relevance of the findings for mitigation strategies more generally and the prospects of generalizing our methods to larger language models, the Transformer architecture, other languages and other undesirable biases.

2021

pdf bib
Attention vs non-attention for a Shapley-based explanation method
Tom Kersten | Hugh Mee Wong | Jaap Jumelet | Dieuwke Hupkes
Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

The field of explainable AI has recently seen an explosion in the number of explanation methods for highly non-linear deep neural networks. The extent to which such methods – that are often proposed and tested in the domain of computer vision – are appropriate to address the explainability challenges in NLP is yet relatively unexplored. In this work, we consider Contextual Decomposition (CD) – a Shapley-based input feature attribution method that has been shown to work well for recurrent NLP models – and we test the extent to which it is useful for models that contain attention operations. To this end, we extend CD to cover the operations necessary for attention-based models. We then compare how long distance subject-verb relationships are processed by models with and without attention, considering a number of different syntactic structures in two different languages: English and Dutch. Our experiments confirm that CD can successfully be applied for attention-based models as well, providing an alternative Shapley-based attribution method for modern neural networks. In particular, using CD, we show that the English and Dutch models demonstrate similar processing behaviour, but that under the hood there are consistent differences between our attention and non-attention models.

pdf bib
Language Modelling as a Multi-Task Problem
Lucas Weber | Jaap Jumelet | Elia Bruni | Dieuwke Hupkes
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

In this paper, we propose to study language modelling as a multi-task problem, bringing together three strands of research: multi-task learning, linguistics, and interpretability. Based on hypotheses derived from linguistic theory, we investigate whether language models adhere to learning principles of multi-task learning during training. To showcase the idea, we analyse the generalisation behaviour of language models as they learn the linguistic concept of Negative Polarity Items (NPIs). Our experiments demonstrate that a multi-task setting naturally emerges within the objective of the more general task of language modelling. We argue that this insight is valuable for multi-task learning, linguistics and interpretability research and can lead to exciting new findings in all three domains.

pdf bib
Language Models Use Monotonicity to Assess NPI Licensing
Jaap Jumelet | Milica Denic | Jakub Szymanik | Dieuwke Hupkes | Shane Steinert-Threlkeld
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
diagNNose: A Library for Neural Activation Analysis
Jaap Jumelet
Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

In this paper we introduce diagNNose, an open source library for analysing the activations of deep neural networks. diagNNose contains a wide array of interpretability techniques that provide fundamental insights into the inner workings of neural networks. We demonstrate the functionality of diagNNose with a case study on subject-verb agreement within language models. diagNNose is available at https://github.com/i-machine-think/diagnnose.

2019

pdf bib
Analysing Neural Language Models: Contextual Decomposition Reveals Default Reasoning in Number and Gender Assignment
Jaap Jumelet | Willem Zuidema | Dieuwke Hupkes
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Extensive research has recently shown that recurrent neural language models are able to process a wide range of grammatical phenomena. How these models are able to perform these remarkable feats so well, however, is still an open question. To gain more insight into what information LSTMs base their decisions on, we propose a generalisation of Contextual Decomposition (GCD). In particular, this setup enables us to accurately distil which part of a prediction stems from semantic heuristics, which part truly emanates from syntactic cues and which part arise from the model biases themselves instead. We investigate this technique on tasks pertaining to syntactic agreement and co-reference resolution and discover that the model strongly relies on a default reasoning effect to perform these tasks.

2018

pdf bib
Do Language Models Understand Anything? On the Ability of LSTMs to Understand Negative Polarity Items
Jaap Jumelet | Dieuwke Hupkes
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

In this paper, we attempt to link the inner workings of a neural language model to linguistic theory, focusing on a complex phenomenon well discussed in formal linguistics: (negative) polarity items. We briefly discuss the leading hypotheses about the licensing contexts that allow negative polarity items and evaluate to what extent a neural language model has the ability to correctly process a subset of such constructions. We show that the model finds a relation between the licensing context and the negative polarity item and appears to be aware of the scope of this context, which we extract from a parse tree of the sentence. With this research, we hope to pave the way for other studies linking formal linguistics to deep learning.