Aaron Schein


2024

pdf bib
Activation Scaling for Steering and Interpreting Language Models
Niklas Stoehr | Kevin Du | Vésteinn Snæbjarnarson | Robert West | Ryan Cotterell | Aaron Schein
Findings of the Association for Computational Linguistics: EMNLP 2024

Given the prompt “Rome is in”, can we steer a language model to flip its prediction of an incorrect token “France” to a correct token “Italy” by only multiplying a few relevant activation vectors with scalars? We argue that successfully intervening on a model is a prerequisite for interpreting its internal workings. Concretely, we establish a three-term objective: a successful intervention should flip the correct with the wrong token and vice versa (effectiveness), and leave other tokens unaffected (faithfulness), all while being sparse (minimality). Using gradient-based optimization, this objective lets us learn (and later evaluate) a specific kind of efficient and interpretable intervention: activation scaling only modifies the signed magnitude of activation vectors to strengthen, weaken, or reverse the steering directions already encoded in the model. On synthetic tasks, this intervention performs comparably with steering vectors in terms of effectiveness and faithfulness, but is much more minimal allowing us to pinpoint interpretable model components. We evaluate activation scaling from different angles, compare performance on different datasets, and make activation scalars a learnable function of the activation vectors themselves to generalize to varying-length prompts.

pdf bib
Context versus Prior Knowledge in Language Models
Kevin Du | Vésteinn Snæbjarnarson | Niklas Stoehr | Jennifer White | Aaron Schein | Ryan Cotterell
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

To answer a question, language models often need to integrate prior knowledge learned during pretraining and new information presented in context. We hypothesize that models perform this integration in a predictable way across different questions and contexts: models will rely more on prior knowledge for questions about entities (e.g., persons, places, etc.) that they are more familiar with due to higher exposure in the training corpus, and be more easily persuaded by some contexts than others. To formalize this problem, we propose two mutual information-based metrics to measure a model’s dependency on a context and on its prior about an entity: first, the persuasion score of a given context represents how much a model depends on the context in its decision, and second, the susceptibility score of a given entity represents how much the model can be swayed away from its original answer distribution about an entity. We empirically test our metrics for their validity and reliability. Finally, we explore and find a relationship between the scores and the model’s expected familiarity with an entity, and provide two use cases to illustrate their benefits.

2023

pdf bib
An Ordinal Latent Variable Model of Conflict Intensity
Niklas Stoehr | Lucas Torroba Hennigen | Josef Valvoda | Robert West | Ryan Cotterell | Aaron Schein
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Measuring the intensity of events is crucial for monitoring and tracking armed conflict. Advances in automated event extraction have yielded massive data sets of “who did what to whom” micro-records that enable data-driven approaches to monitoring conflict. The Goldstein scale is a widely-used expert-based measure that scores events on a conflictual–cooperative scale. It is based only on the action category (“what”) and disregards the subject (“who”) and object (“to whom”) of an event, as well as contextual information, like associated casualty count, that should contribute to the perception of an event’s “intensity”. This paper takes a latent variable-based approach to measuring conflict intensity. We introduce a probabilistic generative model that assumes each observed event is associated with a latent intensity class. A novel aspect of this model is that it imposes an ordering on the classes, such that higher-valued classes denote higher levels of intensity. The ordinal nature of the latent variable is induced from naturally ordered aspects of the data (e.g., casualty counts) where higher values naturally indicate higher intensity. We evaluate the proposed model both intrinsically and extrinsically, showing that it obtains comparatively good held-out predictive performance.

pdf bib
Sentiment as an Ordinal Latent Variable
Niklas Stoehr | Ryan Cotterell | Aaron Schein
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Sentiment analysis has become a central tool in various disciplines outside of natural language processing. In particular in applied and domain-specific settings with strong requirements for interpretable methods, dictionary-based approaches are still a popular choice. However, existing dictionaries are often limited in coverage, static once annotation is completed and sentiment scales differ widely; some are discrete others continuous. We propose a Bayesian generative model that learns a composite sentiment dictionary as an interpolation between six existing dictionaries with different scales. We argue that sentiment is a latent concept with intrinsically ranking-based characteristics — the word “excellent” may be ranked more positive than “great” and “okay”, but it is hard to express how much more exactly. This prompts us to enforce an ordinal scale of ordered discrete sentiment values in our dictionary. We achieve this through an ordering transformation in the priors of our model. We evaluate the model intrinsically by imputing missing values in existing dictionaries. Moreover, we conduct extrinsic evaluations through sentiment classification tasks. Finally, we present two extension: first, we present a method to augment dictionary-based approaches with word embeddings to construct sentiment scales along new semantic axes. Second, we demonstrate a Latent Dirichlet Allocation-inspired variant of our model that learns document topics that are ordered by sentiment.

2012

pdf bib
International Multicultural Name Matching Competition: Design, Execution, Results, and Lessons Learned
Keith J. Miller | Elizabeth Schroeder Richerson | Sarah McLeod | James Finley | Aaron Schein
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes different aspects of an open competition to evaluate multicultural name matching software, including the contest design, development of the test data, different phases of the competition, behavior of the participating teams, results of the competition, and lessons learned throughout. The competition, known as The MITRE Challenge™, was informally announced at LREC 2010 and was recently concluded. Contest participants used the competition website (http://mitrechallenge.mitre.org) to download the competition data set and guidelines, upload results, and to view accuracy metrics for each result set submitted. Participants were allowed to submit unlimited result sets, with their top-scoring set determining their overall ranking. The competition website featured a leader board that displayed the top score for each participant, ranked according to the principal contest metric - mean average precision (MAP). MAP and other metrics were calculated in near-real time on a remote server, based on ground truth developed for the competition data set. Additional measures were taken to guard against gaming the competition metric or overfitting to the competition data set. Lessons learned during running this first MITRE Challenge will be valuable to others considering running similar evaluation campaigns.