Svetlana Kiritchenko - ACL Anthology

Svetlana Kiritchenko

2025

When Detection Fails: The Power of Fine-Tuned Models to Generate Human-Like Social Media Text
Hillary Dawkins | Kathleen C. Fraser | Svetlana Kiritchenko
Findings of the Association for Computational Linguistics: ACL 2025

Detecting AI-generated text is a difficult problem to begin with; detecting AI-generated text on social media is made even more difficult due to the short text length and informal, idiosyncratic language of the internet. It is nonetheless important to tackle this problem, as social media represents a significant attack vector in online influence campaigns, which may be bolstered through the use of mass-produced AI-generated posts supporting (or opposing) particular policies, decisions, or events. We approach this problem with the mindset and resources of a reasonably sophisticated threat actor, and create a dataset of 505,159 AI-generated social media posts from a combination of open-source, closed-source, and fine-tuned LLMs, covering 11 different controversial topics. We show that while the posts can be detected under typical research assumptions about knowledge of and access to the generating models, under the more realistic assumption that an attacker will not release their fine-tuned model to the public, detectability drops dramatically. This result is confirmed with a human study. Ablation experiments highlight the vulnerability of various detection algorithms to fine-tuned LLMs. This result has implications across all detection domains, since fine-tuning is a generally applicable and realistic LLM use case.

Fine-Tuning Lowers Safety and Disrupts Evaluation Consistency
Kathleen C. Fraser | Hillary Dawkins | Isar Nejadgholi | Svetlana Kiritchenko
Proceedings of the The First Workshop on LLM Security (LLMSEC)

Fine-tuning a general-purpose large language model (LLM) for a specific domain or task has become a routine procedure for ordinary users. However, fine-tuning is known to remove the safety alignment features of the model, even when the fine-tuning data does not contain any harmful content. We consider this to be a critical failure mode of LLMs due to the widespread uptake of fine-tuning, combined with the benign nature of the “attack”. Most well-intentioned developers are likely unaware that they are deploying an LLM with reduced safety. On the other hand, this known vulnerability can be easily exploited by malicious actors intending to bypass safety guardrails. To make any meaningful progress in mitigating this issue, we first need reliable and reproducible safety evaluations. In this work, we investigate how robust a safety benchmark is to trivial variations in the experimental procedure, and the stochastic nature of LLMs. Our initial experiments expose surprising variance in the results of the safety evaluation, even when seemingly inconsequential changes are made to the fine-tuning setup. Our observations have serious implications for how researchers in this field should report results to enable meaningful comparisons in the future.

Tackling Social Bias against the Poor: a Dataset and a Taxonomy on Aporophobia
Georgina Curto | Svetlana Kiritchenko | Muhammad Hammad Fahim Siddiqui | Isar Nejadgholi | Kathleen C. Fraser
Findings of the Association for Computational Linguistics: NAACL 2025

Eradicating poverty is the first goal in the U.N. Sustainable Development Goals. However, aporophobia – the societal bias against people living in poverty – constitutes a major obstacle to designing, approving and implementing poverty-mitigation policies. This work presents an initial step towards operationalizing the concept of aporophobia to identify and track harmful beliefs and discriminative actions against poor people on social media. In close collaboration with non-profits and governmental organizations, we conduct data collection and exploration. Then we manually annotate a corpus of English tweets from five world regions for the presence of (1) direct expressions of aporophobia, and (2) statements referring to or criticizing aporophobic views or actions of others, to comprehensively characterize the social media discourse related to bias and discrimination against the poor. Based on the annotated data, we devise a taxonomy of categories of aporophobic attitudes and actions expressed through speech on social media. Finally, we train several classifiers and identify the main challenges for automatic detection of aporophobia in social networks. This work paves the way towards identifying, tracking, and mitigating aporophobic views on social media at scale.

Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals
Phillip Howard | Kathleen C. Fraser | Anahita Bhiwandiwalla | Svetlana Kiritchenko
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

With the advent of Large Language Models (LLMs) possessing increasingly impressive capabilities, a number of Large Vision-Language Models (LVLMs) have been proposed to augment LLMs with visual inputs. Such models condition generated text on both an input image and a text prompt, enabling a variety of use cases such as visual question answering and multimodal chat. While prior studies have examined the social biases contained in text generated by LLMs, this topic has been relatively unexplored in LVLMs. Examining social biases in LVLMs is particularly challenging due to the confounding contributions of bias induced by information contained across the text and visual modalities. To address this challenging problem, we conduct a large-scale study of text generated by different LVLMs under counterfactual changes to input images, producing over 57 million responses from popular models. Our multi-dimensional bias evaluation framework reveals that social attributes such as perceived race, gender, and physical characteristics depicted in images can significantly influence the generation of toxic content, competency-associated words, harmful stereotypes, and numerical ratings of individuals.

2024

How Does Stereotype Content Differ across Data Sources?
Kathleen Fraser | Svetlana Kiritchenko | Isar Nejadgholi
Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024)

For decades, psychologists have been studying stereotypes using specially-designed rating scales to capture people’s beliefs and opinions about different social groups. Now, using NLP tools on extensive collections of text, we have the opportunity to study stereotypes “in the wild” and on a large scale. However, are we truly capturing the same information? In this paper we compare measurements along six psychologically-motivated, stereotype-relevant dimensions (Sociability, Morality, Ability, Assertiveness, Beliefs, and Status) for 10 groups, defined by occupation. We compute these measurements on stereotypical English sentences written by crowd-workers, stereotypical sentences generated by ChatGPT, and more general data collected from social media, and contrast the findings with traditional, survey-based results, as well as a spontaneous word-list generation task. We find that while the correlation with the traditional scales varies across dimensions, the free-text data can be used to specify the particular traits associated with each group, and provide context for numerical survey data.

Adaptable Moral Stances of Large Language Models on Sexist Content: Implications for Society and Gender Discourse
Rongchen Guo | Isar Nejadgholi | Hillary Dawkins | Kathleen C. Fraser | Svetlana Kiritchenko
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

This work provides an explanatory view of how LLMs can apply moral reasoning to both criticize and defend sexist language. We assessed eight large language models, all of which demonstrated the capability to provide explanations grounded in varying moral perspectives for both critiquing and endorsing views that reflect sexist assumptions. With both human and automatic evaluation, we show that all eight models produce comprehensible and contextually relevant text, which is helpful in understanding diverse views on how sexism is perceived. Also, through analysis of moral foundations cited by LLMs in their arguments, we uncover the diverse ideological perspectives in models’ outputs, with some models aligning more with progressive or conservative views on gender roles and sexism.Based on our observations, we caution against the potential misuse of LLMs to justify sexist language. We also highlight that LLMs can serve as tools for understanding the roots of sexist beliefs and designing well-informed interventions. Given this dual capacity, it is crucial to monitor LLMs and design safety mechanisms for their use in applications that involve sensitive societal topics, such as sexism.

Examining Gender and Racial Bias in Large Vision–Language Models Using a Novel Dataset of Parallel Images
Kathleen Fraser | Svetlana Kiritchenko
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Following on recent advances in large language models (LLMs) and subsequent chat models, a new wave of large vision–language models (LVLMs) has emerged. Such models can incorporate images as input in addition to text, and perform tasks such as visual question answering, image captioning, story generation, etc. Here, we examine potential gender and racial biases in such systems, based on the perceived characteristics of the people in the input images. To accomplish this, we present a new dataset PAIRS (PArallel Images for eveRyday Scenarios). The PAIRS dataset contains sets of AI-generated images of people, such that the images are highly similar in terms of background and visual content, but differ along the dimensions of gender (man, woman) and race (Black, white). By querying the LVLMs with such images, we observe significant differences in the responses according to the perceived gender or race of the person depicted.

Challenging Negative Gender Stereotypes: A Study on the Effectiveness of Automated Counter-Stereotypes
Isar Nejadgholi | Kathleen C. Fraser | Anna Kerkhof | Svetlana Kiritchenko
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Gender stereotypes are pervasive beliefs about individuals based on their gender that play a significant role in shaping societal attitudes, behaviours, and even opportunities. Recognizing the negative implications of gender stereotypes, particularly in online communications, this study investigates eleven strategies to automatically counteract and challenge these views. We present AI-generated gender-based counter-stereotypes to (self-identified) male and female study participants and ask them to assess their offensiveness, plausibility, and potential effectiveness. The strategies of counter-facts and broadening universals (i.e., stating that anyone can have a trait regardless of group membership) emerged as the most robust approaches, while humour, perspective-taking, counter-examples, and empathy for the speaker were perceived as less effective. Also, the differences in ratings were more pronounced for stereotypes about the different targets than between the genders of the raters. Alarmingly, many AI-generated counter-stereotypes were perceived as offensive and/or implausible. Our analysis and the collected dataset offer foundational insight into counter-stereotype generation, guiding future efforts to develop strategies that effectively challenge gender stereotypes in online interactions.

The Crime of Being Poor: Associations between Crime and Poverty on Social Media in Eight Countries
Georgina Curto | Svetlana Kiritchenko | Kathleen Fraser | Isar Nejadgholi
Proceedings of the Sixth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS 2024)

Negative public perceptions of people living in poverty can hamper policies and programs that aim to help the poor. One prominent example of social bias and discrimination against people in need is the persistent association of poverty with criminality. The phenomenon has two facets: first, the belief that poor people are more likely to engage in crime (e.g., stealing, mugging, violence) and second, the view that certain behaviors directly resulting from poverty (e.g., living outside, panhandling) warrant criminal punishment. In this paper, we use large language models (LLMs) to identify examples of crime–poverty association (CPA) in English social media texts. We analyze the online discourse on CPA across eight geographically-diverse countries, and find evidence that the CPA rates are higher within the sample obtained from the U.S. and Canada, as compared to the other countries such as South Africa, despite the latter having higher poverty, criminality, and inequality indexes. We further uncover and analyze the most common themes in CPA posts and find more negative and biased attitudes toward people living in poverty in posts from the U.S. and Canada. These results could partially be explained by cultural factors related to the tendency to overestimate the equality of opportunities and social mobility in the U.S. and Canada. These findings have consequences for policy-making and open a new path of research for poverty mitigation with the focus not only on the redistribution of wealth but also on the mitigation of bias and discrimination against people in need.

2023

What Makes a Good Counter-Stereotype? Evaluating Strategies for Automated Responses to Stereotypical Text
Kathleen Fraser | Svetlana Kiritchenko | Isar Nejadgholi | Anna Kerkhof
Proceedings of the First Workshop on Social Influence in Conversations (SICon 2023)

When harmful social stereotypes are expressed on a public platform, they must be addressed in a way that educates and informs both the original poster and other readers, without causing offence or perpetuating new stereotypes. In this paper, we synthesize findings from psychology and computer science to propose a set of potential counter-stereotype strategies. We then automatically generate such counter-stereotypes using ChatGPT, and analyze their correctness and expected effectiveness at reducing stereotypical associations. We identify the strategies of denouncing stereotypes, warning of consequences, and using an empathetic tone as three promising strategies to be further tested.

Concept-Based Explanations to Test for False Causal Relationships Learned by Abusive Language Classifiers
Isar Nejadgholi | Svetlana Kiritchenko | Kathleen C. Fraser | Esma Balkir
The 7th Workshop on Online Abuse and Harms (WOAH)

Classifiers tend to learn a false causal relationship between an over-represented concept and a label, which can result in over-reliance on the concept and compromised classification accuracy. It is imperative to have methods in place that can compare different models and identify over-reliances on specific concepts. We consider three well-known abusive language classifiers trained on large English datasets and focus on the concept of negative emotions, which is an important signal but should not be learned as a sufficient feature for the label of abuse. Motivated by the definition of global sufficiency, we first examine the unwanted dependencies learned by the classifiers by assessing their accuracy on a challenge set across all decision thresholds. Further, recognizing that a challenge set might not always be available, we introduce concept-based explanation metrics to assess the influence of the concept on the labels. These explanations allow us to compare classifiers regarding the degree of false global sufficiency they have learned between a concept and a label.

Aporophobia: An Overlooked Type of Toxic Language Targeting the Poor
Svetlana Kiritchenko | Georgina Curto Rex | Isar Nejadgholi | Kathleen C. Fraser
The 7th Workshop on Online Abuse and Harms (WOAH)

While many types of hate speech and online toxicity have been the focus of extensive research in NLP, toxic language stigmatizing poor people has been mostly disregarded. Yet, aporophobia, a social bias against the poor, is a common phenomenon online, which can be psychologically damaging as well as hindering poverty reduction policy measures. We demonstrate that aporophobic attitudes are indeed present in social media and argue that the existing NLP datasets and models are inadequate to effectively address this problem. Efforts toward designing specialized resources and novel socio-technical mechanisms for confronting aporophobia are needed.

2022

Does Moral Code have a Moral Code? Probing Delphi’s Moral Philosophy
Kathleen C. Fraser | Svetlana Kiritchenko | Esma Balkir
Proceedings of the 2nd Workshop on Trustworthy Natural Language Processing (TrustNLP 2022)

In an effort to guarantee that machine learning model outputs conform with human moral values, recent work has begun exploring the possibility of explicitly training models to learn the difference between right and wrong. This is typically done in a bottom-up fashion, by exposing the model to different scenarios, annotated with human moral judgements. One question, however, is whether the trained models actually learn any consistent, higher-level ethical principles from these datasets – and if so, what? Here, we probe the Allen AI Delphi model with a set of standardized morality questionnaires, and find that, despite some inconsistencies, Delphi tends to mirror the moral principles associated with the demographic groups involved in the annotation process. We question whether this is desirable and discuss how we might move forward with this knowledge.

Challenges in Applying Explainability Methods to Improve the Fairness of NLP Models
Esma Balkir | Svetlana Kiritchenko | Isar Nejadgholi | Kathleen Fraser
Proceedings of the 2nd Workshop on Trustworthy Natural Language Processing (TrustNLP 2022)

Motivations for methods in explainable artificial intelligence (XAI) often include detecting, quantifying and mitigating bias, and contributing to making machine learning models fairer. However, exactly how an XAI method can help in combating biases is often left unspecified. In this paper, we briefly review trends in explainability and fairness in NLP research, identify the current practices in which explainability methods are applied to detect and mitigate bias, and investigate the barriers preventing XAI methods from being used more widely in tackling fairness issues.

Improving Generalizability in Implicitly Abusive Language Detection with Concept Activation Vectors
Isar Nejadgholi | Kathleen Fraser | Svetlana Kiritchenko
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Robustness of machine learning models on ever-changing real-world data is critical, especially for applications affecting human well-being such as content moderation. New kinds of abusive language continually emerge in online discussions in response to current events (e.g., COVID-19), and the deployed abuse detection systems should be updated regularly to remain accurate. In this paper, we show that general abusive language classifiers tend to be fairly reliable in detecting out-of-domain explicitly abusive utterances but fail to detect new types of more subtle, implicit abuse. Next, we propose an interpretability technique, based on the Testing Concept Activation Vector (TCAV) method from computer vision, to quantify the sensitivity of a trained model to the human-defined concepts of explicit and implicit abusive language, and use that to explain the generalizability of the model on new data, in this case, COVID-related anti-Asian hate speech. Extending this technique, we introduce a novel metric, Degree of Explicitness, for a single instance and show that the new metric is beneficial in suggesting out-of-domain unlabeled examples to effectively enrich the training data with informative, implicitly abusive texts.

Towards Procedural Fairness: Uncovering Biases in How a Toxic Language Classifier Uses Sentiment Information
Isar Nejadgholi | Esma Balkir | Kathleen Fraser | Svetlana Kiritchenko
Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Previous works on the fairness of toxic language classifiers compare the output of models with different identity terms as input features but do not consider the impact of other important concepts present in the context. Here, besides identity terms, we take into account high-level latent features learned by the classifier and investigate the interaction between these features and identity terms. For a multi-class toxic language classifier, we leverage a concept-based explanation framework to calculate the sensitivity of the model to the concept of sentiment, which has been used before as a salient feature for toxic language detection. Our results show that although for some classes, the classifier has learned the sentiment information as expected, this information is outweighed by the influence of identity terms as input features. This work is a step towards evaluating procedural fairness, where unfair processes lead to unfair outcomes. The produced knowledge can guide debiasing techniques to ensure that important concepts besides identity terms are well-represented in training datasets.

Extracting Age-Related Stereotypes from Social Media Texts
Kathleen C. Fraser | Svetlana Kiritchenko | Isar Nejadgholi
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Age-related stereotypes are pervasive in our society, and yet have been under-studied in the NLP community. Here, we present a method for extracting age-related stereotypes from Twitter data, generating a corpus of 300,000 over-generalizations about four contemporary generations (baby boomers, generation X, millennials, and generation Z), as well as “old” and “young” people more generally. By employing word-association metrics, semi-supervised topic modelling, and density-based clustering, we uncover many common stereotypes as reported in the media and in the psychological literature, as well as some more novel findings. We also observe trends consistent with the existing literature, namely that definitions of “young” and “old” age appear to be context-dependent, stereotypes for different generations vary across different topics (e.g., work versus family life), and some age-based stereotypes are distinct from generational stereotypes. The method easily extends to other social group labels, and therefore can be used in future work to study stereotypes of different social categories. By better understanding how stereotypes are formed and spread, and by tracking emerging stereotypes, we hope to eventually develop mitigating measures against such biased statements.

Necessity and Sufficiency for Explaining Text Classifiers: A Case Study in Hate Speech Detection
Esma Balkir | Isar Nejadgholi | Kathleen Fraser | Svetlana Kiritchenko
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We present a novel feature attribution method for explaining text classifiers, and analyze it in the context of hate speech detection. Although feature attribution models usually provide a single importance score for each token, we instead provide two complementary and theoretically-grounded scores – necessity and sufficiency – resulting in more informative explanations. We propose a transparent method that calculates these values by generating explicit perturbations of the input text, allowing the importance scores themselves to be explainable. We employ our method to explain the predictions of different hate speech detection models on the same set of curated examples from a test suite, and show that different values of necessity and sufficiency for identity terms correspond to different kinds of false positive errors, exposing sources of classifier bias against marginalized groups.

2021

Understanding and Countering Stereotypes: A Computational Approach to the Stereotype Content Model
Kathleen C. Fraser | Isar Nejadgholi | Svetlana Kiritchenko
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Stereotypical language expresses widely-held beliefs about different social categories. Many stereotypes are overtly negative, while others may appear positive on the surface, but still lead to negative consequences. In this work, we present a computational approach to interpreting stereotypes in text through the Stereotype Content Model (SCM), a comprehensive causal theory from social psychology. The SCM proposes that stereotypes can be understood along two primary dimensions: warmth and competence. We present a method for defining warmth and competence axes in semantic embedding space, and show that the four quadrants defined by this subspace accurately represent the warmth and competence concepts, according to annotated lexicons. We then apply our computational SCM model to textual stereotype data and show that it compares favourably with survey-based studies in the psychological literature. Furthermore, we explore various strategies to counter stereotypical beliefs with anti-stereotypes. It is known that countering stereotypes with anti-stereotypical examples is one of the most effective ways to reduce biased thinking, yet the problem of generating anti-stereotypes has not been previously studied. Thus, a better understanding of how to generate realistic and effective anti-stereotypes can contribute to addressing pressing societal concerns of stereotyping, prejudice, and discrimination.

2020

SOLO: A Corpus of Tweets for Examining the State of Being Alone
Svetlana Kiritchenko | Will Hipson | Robert Coplan | Saif M. Mohammad
Proceedings of the Twelfth Language Resources and Evaluation Conference

The state of being alone can have a substantial impact on our lives, though experiences with time alone diverge significantly among individuals. Psychologists distinguish between the concept of solitude, a positive state of voluntary aloneness, and the concept of loneliness, a negative state of dissatisfaction with the quality of one’s social interactions. Here, for the first time, we conduct a large-scale computational analysis to explore how the terms associated with the state of being alone are used in online language. We present SOLO (State of Being Alone), a corpus of over 4 million tweets collected with query terms solitude, lonely, and loneliness. We use SOLO to analyze the language and emotions associated with the state of being alone. We show that the term solitude tends to co-occur with more positive, high-dominance words (e.g., enjoy, bliss) while the terms lonely and loneliness frequently co-occur with negative, low-dominance words (e.g., scared, depressed), which confirms the conceptual distinctions made in psychology. We also show that women are more likely to report on negative feelings of being lonely as compared to men, and there are more teenagers among the tweeters that use the word lonely than among the tweeters that use the word solitude.

On Cross-Dataset Generalization in Automatic Detection of Online Abuse
Isar Nejadgholi | Svetlana Kiritchenko
Proceedings of the Fourth Workshop on Online Abuse and Harms

NLP research has attained high performances in abusive language detection as a supervised classification task. While in research settings, training and test datasets are usually obtained from similar data samples, in practice systems are often applied on data that are different from the training set in topic and class distributions. Also, the ambiguity in class definitions inherited in this task aggravates the discrepancies between source and target datasets. We explore the topic bias and the task formulation bias in cross-dataset generalization. We show that the benign examples in the Wikipedia Detox dataset are biased towards platform-specific topics. We identify these examples using unsupervised topic modeling and manual inspection of topics’ keywords. Removing these topics increases cross-dataset generalization, without reducing in-domain classification performance. For a robust dataset design, we suggest applying inexpensive unsupervised methods to inspect the collected data and downsize the non-generalizable content before manually annotating for class labels.

2019

Big BiRD: A Large, Fine-Grained, Bigram Relatedness Dataset for Examining Semantic Composition
Shima Asaadi | Saif Mohammad | Svetlana Kiritchenko
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Bigrams (two-word sequences) hold a special place in semantic composition research since they are the smallest unit formed by composing words. A semantic relatedness dataset that includes bigrams will thus be useful in the development of automatic methods of semantic composition. However, existing relatedness datasets only include pairs of unigrams (single words). Further, existing datasets were created using rating scales and thus suffer from limitations such as in consistent annotations and scale region bias. In this paper, we describe how we created a large, fine-grained, bigram relatedness dataset (BiRD), using a comparative annotation technique called Best–Worst Scaling. Each of BiRD’s 3,345 English term pairs involves at least one bigram. We show that the relatedness scores obtained are highly reliable (split-half reliability r= 0.937). We analyze the data to obtain insights into bigram semantic relatedness. Finally, we present benchmark experiments on using the relatedness dataset as a testbed to evaluate simple unsupervised measures of semantic composition. BiRD is made freely available to foster further research on how meaning can be represented and how meaning can be composed.

2018

WikiArt Emotions: An Annotated Dataset of Emotions Evoked by Art
Saif Mohammad | Svetlana Kiritchenko
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Understanding Emotions: A Dataset of Tweets to Study Interactions between Affect Categories
Saif Mohammad | Svetlana Kiritchenko
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

DeepMiner at SemEval-2018 Task 1: Emotion Intensity Recognition Using Deep Representation Learning
Habibeh Naderi | Behrouz Haji Soleimani | Saif Mohammad | Svetlana Kiritchenko | Stan Matwin
Proceedings of the 12th International Workshop on Semantic Evaluation

In this paper, we propose a regression system to infer the emotion intensity of a tweet. We develop a multi-aspect feature learning mechanism to capture the most discriminative semantic features of a tweet as well as the emotion information conveyed by each word in it. We combine six types of feature groups: (1) a tweet representation learned by an LSTM deep neural network on the training data, (2) a tweet representation learned by an LSTM network on a large corpus of tweets that contain emotion words (a distant supervision corpus), (3) word embeddings trained on the distant supervision corpus and averaged over all words in a tweet, (4) word and character n-grams, (5) features derived from various sentiment and emotion lexicons, and (6) other hand-crafted features. As part of the word embedding training, we also learn the distributed representations of multi-word expressions (MWEs) and negated forms of words. An SVR regressor is then trained over the full set of features. We evaluate the effectiveness of our ensemble feature sets on the SemEval-2018 Task 1 datasets and achieve a Pearson correlation of 72% on the task of tweet emotion intensity prediction.

Quantifying Qualitative Data for Understanding Controversial Issues
Michael Wojatzki | Saif Mohammad | Torsten Zesch | Svetlana Kiritchenko
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Agree or Disagree: Predicting Judgments on Nuanced Assertions
Michael Wojatzki | Torsten Zesch | Saif Mohammad | Svetlana Kiritchenko
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

Being able to predict whether people agree or disagree with an assertion (i.e. an explicit, self-contained statement) has several applications ranging from predicting how many people will like or dislike a social media post to classifying posts based on whether they are in accordance with a particular point of view. We formalize this as two NLP tasks: predicting judgments of (i) individuals and (ii) groups based on the text of the assertion and previous judgments. We evaluate a wide range of approaches on a crowdsourced data set containing over 100,000 judgments on over 2,000 assertions. We find that predicting individual judgments is a hard task with our best results only slightly exceeding a majority baseline, but that judgments of groups can be more reliably predicted using a Siamese neural network, which outperforms all other approaches by a wide margin.

Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems
Svetlana Kiritchenko | Saif Mohammad
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

Automatic machine learning systems can inadvertently accentuate and perpetuate inappropriate human biases. Past work on examining inappropriate biases has largely focused on just individual systems. Further, there is no benchmark dataset for examining inappropriate biases in systems. Here for the first time, we present the Equity Evaluation Corpus (EEC), which consists of 8,640 English sentences carefully chosen to tease out biases towards certain races and genders. We use the dataset to examine 219 automatic sentiment analysis systems that took part in a recent shared task, SemEval-2018 Task 1 ‘Affect in Tweets’. We find that several of the systems show statistically significant bias; that is, they consistently provide slightly higher sentiment intensity predictions for one race or one gender. We make the EEC freely available.

SemEval-2018 Task 1: Affect in Tweets
Saif Mohammad | Felipe Bravo-Marquez | Mohammad Salameh | Svetlana Kiritchenko
Proceedings of the 12th International Workshop on Semantic Evaluation

We present the SemEval-2018 Task 1: Affect in Tweets, which includes an array of subtasks on inferring the affectual state of a person from their tweet. For each task, we created labeled data from English, Arabic, and Spanish tweets. The individual tasks are: 1. emotion intensity regression, 2. emotion intensity ordinal classification, 3. valence (sentiment) regression, 4. valence ordinal classification, and 5. emotion classification. Seventy-five teams (about 200 team members) participated in the shared task. We summarize the methods, resources, and tools used by the participating teams, with a focus on the techniques and resources that are particularly useful. We also analyze systems for consistent bias towards a particular race or gender. The data is made freely available to further improve our understanding of how people convey emotions through language.

2017

Best-Worst Scaling More Reliable than Rating Scales: A Case Study on Sentiment Intensity Annotation
Svetlana Kiritchenko | Saif Mohammad
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Rating scales are a widely used method for data annotation; however, they present several challenges, such as difficulty in maintaining inter- and intra-annotator consistency. Best–worst scaling (BWS) is an alternative method of annotation that is claimed to produce high-quality annotations while keeping the required number of annotations similar to that of rating scales. However, the veracity of this claim has never been systematically established. Here for the first time, we set up an experiment that directly compares the rating scale method with BWS. We show that with the same total number of annotations, BWS produces significantly more reliable results than the rating scale.

2016

Sentiment Composition of Words with Opposing Polarities
Svetlana Kiritchenko | Saif M. Mohammad
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Sentiment Lexicons for Arabic Social Media
Saif Mohammad | Mohammad Salameh | Svetlana Kiritchenko
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Existing Arabic sentiment lexicons have low coverage―with only a few thousand entries. In this paper, we present several large sentiment lexicons that were automatically generated using two different methods: (1) by using distant supervision techniques on Arabic tweets, and (2) by translating English sentiment lexicons into Arabic using a freely available statistical machine translation system. We compare the usefulness of new and old sentiment lexicons in the downstream application of sentence-level sentiment analysis. Our baseline sentiment analysis system uses numerous surface form features. Nonetheless, the system benefits from using additional features drawn from sentiment lexicons. The best result is obtained using the automatically generated Dialectal Hashtag Lexicon and the Arabic translations of the NRC Emotion Lexicon (accuracy of 66.6%). Finally, we describe a qualitative study of the automatic translations of English sentiment lexicons into Arabic, which shows that about 88% of the automatically translated entries are valid for English as well. Close to 10% of the invalid entries are caused by gross mistranslations, close to 40% by translations into a related word, and about 50% by differences in how the word is used in Arabic.

Happy Accident: A Sentiment Composition Lexicon for Opposing Polarity Phrases
Svetlana Kiritchenko | Saif Mohammad
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Sentiment composition is the determining of sentiment of a multi-word linguistic unit, such as a phrase or a sentence, based on its constituents. We focus on sentiment composition in phrases formed by at least one positive and at least one negative word ― phrases like ‘happy accident’ and ‘best winter break’. We refer to such phrases as opposing polarity phrases. We manually annotate a collection of opposing polarity phrases and their constituent single words with real-valued sentiment intensity scores using a method known as Best―Worst Scaling. We show that the obtained annotations are consistent. We explore the entries in the lexicon for linguistic regularities that govern sentiment composition in opposing polarity phrases. Finally, we list the current and possible future applications of the lexicon.

A Dataset for Detecting Stance in Tweets
Saif Mohammad | Svetlana Kiritchenko | Parinaz Sobhani | Xiaodan Zhu | Colin Cherry
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We can often detect from a person’s utterances whether he/she is in favor of or against a given target entity (a product, topic, another person, etc.). Here for the first time we present a dataset of tweets annotated for whether the tweeter is in favor of or against pre-chosen targets of interest―their stance. The targets of interest may or may not be referred to in the tweets, and they may or may not be the target of opinion in the tweets. The data pertains to six targets of interest commonly known and debated in the United States. Apart from stance, the tweets are also annotated for whether the target of interest is the target of opinion in the tweet. The annotations were performed by crowdsourcing. Several techniques were employed to encourage high-quality annotations (for example, providing clear and simple instructions) and to identify and discard poor annotations (for example, using a small set of check questions annotated by the authors). This Stance Dataset, which was subsequently also annotated for sentiment, can be used to better understand the relationship between stance, sentiment, entity relationships, and textual inference.

The Effect of Negators, Modals, and Degree Adverbs on Sentiment Composition
Svetlana Kiritchenko | Saif Mohammad
Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

SemEval-2016 Task 6: Detecting Stance in Tweets
Saif Mohammad | Svetlana Kiritchenko | Parinaz Sobhani | Xiaodan Zhu | Colin Cherry
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

Capturing Reliable Fine-Grained Sentiment Associations by Crowdsourcing and Best–Worst Scaling
Svetlana Kiritchenko | Saif M. Mohammad
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Detecting Stance in Tweets And Analyzing its Interaction with Sentiment
Parinaz Sobhani | Saif Mohammad | Svetlana Kiritchenko
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics

SemEval-2016 Task 7: Determining Sentiment Intensity of English and Arabic Phrases
Svetlana Kiritchenko | Saif Mohammad | Mohammad Salameh
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

SemEval-2015 Task 10: Sentiment Analysis in Twitter
Sara Rosenthal | Preslav Nakov | Svetlana Kiritchenko | Saif Mohammad | Alan Ritter | Veselin Stoyanov
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

Sentiment after Translation: A Case-Study on Arabic Social Media Posts
Mohammad Salameh | Saif Mohammad | Svetlana Kiritchenko
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

NRC-Canada-2014: Detecting Aspects and Sentiment in Customer Reviews
Svetlana Kiritchenko | Xiaodan Zhu | Colin Cherry | Saif Mohammad
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

NRC-Canada-2014: Recent Improvements in the Sentiment Analysis of Tweets
Xiaodan Zhu | Svetlana Kiritchenko | Saif Mohammad
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

An Empirical Study on the Effect of Negation Words on Sentiment
Xiaodan Zhu | Hongyu Guo | Saif Mohammad | Svetlana Kiritchenko
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets
Saif Mohammad | Svetlana Kiritchenko | Xiaodan Zhu
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2011

Lexically-Triggered Hidden Markov Models for Clinical Document Coding
Svetlana Kiritchenko | Colin Cherry
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

Venues