Irina Bigoulaeva

2026

Patches of Nonlinearity: Instruction Vectors in Large Language Models
Irina Bigoulaeva | Jonas Rohweder | Subhabrata Dutta | Iryna Gurevych
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Despite the recent success of instruction-tuned language models and their ubiquitous usage, very little is known of how models process instructions internally. In this work, we address this gap from a mechanistic point of view by investigating how instruction-specific representations are constructed and utilized in different stages of post-training: Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). Via causal mediation, we identify that instruction representation is fairly localized in models. These representations, which we call Instruction Vectors (IVs), demonstrate a curious juxtaposition of linear separability along with non-linear causal interaction, broadly questioning the scope of the linear representation hypothesis commonplace in mechanistic interpretability. To disentangle the non-linear causal interaction, we propose a novel method to localize information processing in language models that is free from the implicit linear assumptions of patching-based techniques. We find that, conditioned on the task representations formed in the early layers, different information pathways are selected in the later layers to solve that task, i.e., IVs act as circuit selectors.

2024

pdf bib abs

Are Emergent Abilities in Large Language Models just In-Context Learning?
Sheng Lu | Irina Bigoulaeva | Rachneet Sachdeva | Harish Tayyar Madabushi | Iryna Gurevych
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large language models, comprising billions of parameters and pre-trained on extensive web-scale corpora, have been claimed to acquire certain capabilities without having been specifically trained on them. These capabilities, referred to as “emergent abilities,” have been a driving force in discussions regarding the potentials and risks of language models. A key challenge in evaluating emergent abilities is that they are confounded by model competencies that arise through alternative prompting techniques, including in-context learning, which is the ability of models to complete a task based on a few examples. We present a novel theory that explains emergent abilities, taking into account their potential confounding factors, and rigorously substantiate this theory through over 1000 experiments. Our findings suggest that purported emergent abilities are not truly emergent, but result from a combination of in-context learning, model memory, and linguistic knowledge. Our work is a foundational step in explaining language model performance, providing a template for their efficient use and clarifying the paradox of their ability to excel in some instances while faltering in others. Thus, we demonstrate that their capabilities should not be overestimated.

2022

pdf bib abs

Effective Cross-Task Transfer Learning for Explainable Natural Language Inference with T5
Irina Bigoulaeva | Rachneet Singh Sachdeva | Harish Tayyar Madabushi | Aline Villavicencio | Iryna Gurevych
Proceedings of the 3rd Workshop on Figurative Language Processing (FLP)

We compare sequential fine-tuning with a model for multi-task learning in the context where we are interested in boosting performance on two of the tasks, one of which depends on the other. We test these models on the FigLang2022 shared task which requires participants to predict language inference labels on figurative language along with corresponding textual explanations of the inference predictions. Our results show that while sequential multi-task learning can be tuned to be good at the first of two target tasks, it performs less well on the second and additionally struggles with overfitting. Our findings show that simple sequential fine-tuning of text-to-text models is an extraordinarily powerful method of achieving cross-task knowledge transfer while simultaneously predicting multiple interdependent targets. So much so, that our best model achieved the (tied) highest score on the task.

2021

pdf bib abs

Cross-Lingual Transfer Learning for Hate Speech Detection
Irina Bigoulaeva | Viktor Hangya | Alexander Fraser
Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion

We address the task of automatic hate speech detection for low-resource languages. Rather than collecting and annotating new hate speech data, we show how to use cross-lingual transfer learning to leverage already existing data from higher-resource languages. Using bilingual word embeddings based classifiers we achieve good performance on the target language by training only on the source dataset. Using our transferred system we bootstrap on unlabeled target language data, improving the performance of standard cross-lingual transfer approaches. We use English as a high resource language and German as the target language for which only a small amount of annotated corpora are available. Our results indicate that cross-lingual transfer learning together with our approach to leverage additional unlabeled data is an effective way of achieving good performance on low-resource target languages without the need for any target-language annotations.

Co-authors

Sheng Lu 1

Jonas Rohweder 1

Rachneet Sachdeva 1

Rachneet Singh Sachdeva 1

Aline Villavicencio 1

Venues

Fix author