Saif Mohammad

Also published as: Saif M. Mohammad


2022

pdf bib
Ethics Sheets for AI Tasks
Saif Mohammad
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Several high-profile events, such as the mass testing of emotion recognition systems on vulnerable sub-populations and using question answering systems to make moral judgments, have highlighted how technology will often lead to more adverse outcomes for those that are already marginalized. At issue here are not just individual systems and datasets, but also the AI tasks themselves. In this position paper, I make a case for thinking about ethical considerations not just at the level of individual models and datasets, but also at the level of AI tasks. I will present a new form of such an effort, Ethics Sheets for AI Tasks, dedicated to fleshing out the assumptions and ethical considerations hidden in how a task is commonly framed and in the choices we make regarding the data, method, and evaluation. I will also present a template for ethics sheets with 50 ethical considerations, using the task of emotion recognition as a running example. Ethics sheets are a mechanism to engage with and document ethical considerations before building datasets and systems. Similar to survey articles, a small number of carefully created ethics sheets can serve numerous researchers and developers.

2021

pdf bib
Ruddit: Norms of Offensiveness for English Reddit Comments
Rishav Hada | Sohi Sudhir | Pushkar Mishra | Helen Yannakoudakis | Saif M. Mohammad | Ekaterina Shutova
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

On social media platforms, hateful and offensive language negatively impact the mental well-being of users and the participation of people from diverse backgrounds. Automatic methods to detect offensive language have largely relied on datasets with categorical labels. However, comments can vary in their degree of offensiveness. We create the first dataset of English language Reddit comments that has fine-grained, real-valued scores between -1 (maximally supportive) and 1 (maximally offensive). The dataset was annotated using Best–Worst Scaling, a form of comparative annotation that has been shown to alleviate known biases of using rating scales. We show that the method produces highly reliable offensiveness scores. Finally, we evaluate the ability of widely-used neural models to predict offensiveness scores on this new dataset.

2020

pdf bib
Examining Citations of Natural Language Processing Literature
Saif M. Mohammad
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We extracted information from the ACL Anthology (AA) and Google Scholar (GS) to examine trends in citations of NLP papers. We explore questions such as: how well cited are papers of different types (journal articles, conference papers, demo papers, etc.)? how well cited are papers from different areas of within NLP? etc. Notably, we show that only about 56% of the papers in AA are cited ten or more times. CL Journal has the most cited papers, but its citation dominance has lessened in recent years. On average, long papers get almost three times as many citations as short papers; and papers on sentiment classification, anaphora resolution, and entity recognition have the highest median citations. The analyses presented here, and the associated dataset of NLP papers mapped to citations, have a number of uses including: understanding how the field is growing and quantifying the impact of different types of papers.

pdf bib
Gender Gap in Natural Language Processing Research: Disparities in Authorship and Citations
Saif M. Mohammad
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Disparities in authorship and citations across gender can have substantial adverse consequences not just on the disadvantaged genders, but also on the field of study as a whole. Measuring gender gaps is a crucial step towards addressing them. In this work, we examine female first author percentages and the citations to their papers in Natural Language Processing (1965 to 2019). We determine aggregate-level statistics using an existing manually curated author--gender list as well as first names strongly associated with a gender. We find that only about 29% of first authors are female and only about 25% of last authors are female. Notably, this percentage has not improved since the mid 2000s. We also show that, on average, female first authors are cited less than male first authors, even when controlling for experience and area of research. Finally, we discuss the ethical considerations involved in automatic demographic analysis.

pdf bib
NLP Scholar: An Interactive Visual Explorer for Natural Language Processing Literature
Saif M. Mohammad
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

As part of the NLP Scholar project, we created a single unified dataset of NLP papers and their meta-information (including citation numbers), by extracting and aligning information from the ACL Anthology and Google Scholar. In this paper, we describe several interconnected interactive visualizations (dashboards) that present various aspects of the data. Clicking on an item within a visualization or entering query terms in the search boxes filters the data in all visualizations in the dashboard. This allows users to search for papers in the area of their interest, published within specific time periods, published by specified authors, etc. The interactive visualizations presented here, and the associated dataset of papers mapped to citations, have additional uses as well including understanding how the field is growing (both overall and across sub-areas), as well as quantifying the impact of different types of papers on subsequent publications.

pdf bib
NLP Scholar: A Dataset for Examining the State of NLP Research
Saif M. Mohammad
Proceedings of the 12th Language Resources and Evaluation Conference

Google Scholar is the largest web search engine for academic literature that also provides access to rich metadata associated with the papers. The ACL Anthology (AA) is the largest repository of articles on Natural Language Processing (NLP). We extracted information from AA for about 44 thousand NLP papers and identified authors who published at least three papers there. We then extracted citation information from Google Scholar for all their papers (not just their AA papers). This resulted in a dataset of 1.1 million papers and associated Google Scholar information. We aligned the information in the AA and Google Scholar datasets to create the NLP Scholar Dataset – a single unified source of information (from both AA and Google Scholar) for tens of thousands of NLP papers. It can be used to identify broad trends in productivity, focus, and impact of NLP research. We present here initial work on analyzing the volume of research in NLP over the years and identifying the most cited papers in NLP. We also list a number of additional potential applications.

pdf bib
SOLO: A Corpus of Tweets for Examining the State of Being Alone
Svetlana Kiritchenko | Will Hipson | Robert Coplan | Saif M. Mohammad
Proceedings of the 12th Language Resources and Evaluation Conference

The state of being alone can have a substantial impact on our lives, though experiences with time alone diverge significantly among individuals. Psychologists distinguish between the concept of solitude, a positive state of voluntary aloneness, and the concept of loneliness, a negative state of dissatisfaction with the quality of one’s social interactions. Here, for the first time, we conduct a large-scale computational analysis to explore how the terms associated with the state of being alone are used in online language. We present SOLO (State of Being Alone), a corpus of over 4 million tweets collected with query terms solitude, lonely, and loneliness. We use SOLO to analyze the language and emotions associated with the state of being alone. We show that the term solitude tends to co-occur with more positive, high-dominance words (e.g., enjoy, bliss) while the terms lonely and loneliness frequently co-occur with negative, low-dominance words (e.g., scared, depressed), which confirms the conceptual distinctions made in psychology. We also show that women are more likely to report on negative feelings of being lonely as compared to men, and there are more teenagers among the tweeters that use the word lonely than among the tweeters that use the word solitude.

pdf bib
PoKi: A Large Dataset of Poems by Children
Will Hipson | Saif M. Mohammad
Proceedings of the 12th Language Resources and Evaluation Conference

Child language studies are crucial in improving our understanding of child well-being; especially in determining the factors that impact happiness, the sources of anxiety, techniques of emotion regulation, and the mechanisms to cope with stress. However, much of this research is stymied by the lack of availability of large child-written texts. We present a new corpus of child-written text, PoKi, which includes about 62 thousand poems written by children from grades 1 to 12. PoKi is especially useful in studying child language because it comes with information about the age of the child authors (their grade). We analyze the words in PoKi along several emotion dimensions (valence, arousal, dominance) and discrete emotions (anger, fear, sadness, joy). We use non-parametric regressions to model developmental differences from early childhood to late-adolescence. Results show decreases in valence that are especially pronounced during mid-adolescence, while arousal and dominance peaked during adolescence. Gender differences in the developmental trajectory of emotions are also observed. Our results support and extend the current state of emotion development research.

pdf bib
WordWars: A Dataset to Examine the Natural Selection of Words
Saif M. Mohammad
Proceedings of the 12th Language Resources and Evaluation Conference

There is a growing body of work on how word meaning changes over time: mutation. In contrast, there is very little work on how different words compete to represent the same meaning, and how the degree of success of words in that competition changes over time: natural selection. We present a new dataset, WordWars, with historical frequency data from the early 1800s to the early 2000s for monosemous English words in over 5000 synsets. We explore three broad questions with the dataset: (1) what is the degree to which predominant words in these synsets have changed, (2) how do prominent word features such as frequency, length, and concreteness impact natural selection, and (3) what are the differences between the predominant words of the 2000s and the predominant words of early 1800s. We show that close to one third of the synsets undergo a change in the predominant word in this time period. Manual annotation of these pairs shows that about 15% of these are orthographic variations, 25% involve affix changes, and 60% have completely different roots. We find that frequency, length, and concreteness all impact natural selection, albeit in different ways.

2019

pdf bib
Proceedings of the 13th International Workshop on Semantic Evaluation
Jonathan May | Ekaterina Shutova | Aurelie Herbelot | Xiaodan Zhu | Marianna Apidianaki | Saif M. Mohammad
Proceedings of the 13th International Workshop on Semantic Evaluation

pdf bib
How do we feel when a robot dies? Emotions expressed on Twitter before and after hitchBOT’s destruction
Kathleen C. Fraser | Frauke Zeller | David Harris Smith | Saif Mohammad | Frank Rudzicz
Proceedings of the Tenth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

In 2014, a chatty but immobile robot called hitchBOT set out to hitchhike across Canada. It similarly made its way across Germany and the Netherlands, and had begun a trip across the USA when it was destroyed by vandals. In this work, we analyze the emotions and sentiments associated with words in tweets posted before and after hitchBOT’s destruction to answer two questions: Were there any differences in the emotions expressed across the different countries visited by hitchBOT? And how did the public react to the demise of hitchBOT? Our analyses indicate that while there were few cross-cultural differences in sentiment towards hitchBOT, there was a significant negative emotional reaction to its destruction, suggesting that people had formed an emotional connection with hitchBOT and perceived its destruction as morally wrong. We discuss potential implications of anthropomorphism and emotional attachment to robots from the perspective of robot ethics.

pdf bib
Big BiRD: A Large, Fine-Grained, Bigram Relatedness Dataset for Examining Semantic Composition
Shima Asaadi | Saif Mohammad | Svetlana Kiritchenko
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Bigrams (two-word sequences) hold a special place in semantic composition research since they are the smallest unit formed by composing words. A semantic relatedness dataset that includes bigrams will thus be useful in the development of automatic methods of semantic composition. However, existing relatedness datasets only include pairs of unigrams (single words). Further, existing datasets were created using rating scales and thus suffer from limitations such as in consistent annotations and scale region bias. In this paper, we describe how we created a large, fine-grained, bigram relatedness dataset (BiRD), using a comparative annotation technique called Best–Worst Scaling. Each of BiRD’s 3,345 English term pairs involves at least one bigram. We show that the relatedness scores obtained are highly reliable (split-half reliability r= 0.937). We analyze the data to obtain insights into bigram semantic relatedness. Finally, we present benchmark experiments on using the relatedness dataset as a testbed to evaluate simple unsupervised measures of semantic composition. BiRD is made freely available to foster further research on how meaning can be represented and how meaning can be composed.

2018

pdf bib
Obtaining Reliable Human Ratings of Valence, Arousal, and Dominance for 20,000 English Words
Saif Mohammad
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Words play a central role in language and thought. Factor analysis studies have shown that the primary dimensions of meaning are valence, arousal, and dominance (VAD). We present the NRC VAD Lexicon, which has human ratings of valence, arousal, and dominance for more than 20,000 English words. We use Best–Worst Scaling to obtain fine-grained scores and address issues of annotation consistency that plague traditional rating scale methods of annotation. We show that the ratings obtained are vastly more reliable than those in existing lexicons. We also show that there exist statistically significant differences in the shared understanding of valence, arousal, and dominance across demographic variables such as age, gender, and personality.

pdf bib
Word Affect Intensities
Saif Mohammad
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Understanding Emotions: A Dataset of Tweets to Study Interactions between Affect Categories
Saif Mohammad | Svetlana Kiritchenko
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
WikiArt Emotions: An Annotated Dataset of Emotions Evoked by Art
Saif Mohammad | Svetlana Kiritchenko
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Quantifying Qualitative Data for Understanding Controversial Issues
Michael Wojatzki | Saif Mohammad | Torsten Zesch | Svetlana Kiritchenko
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Proceedings of The 12th International Workshop on Semantic Evaluation
Marianna Apidianaki | Saif M. Mohammad | Jonathan May | Ekaterina Shutova | Steven Bethard | Marine Carpuat
Proceedings of The 12th International Workshop on Semantic Evaluation

pdf bib
SemEval-2018 Task 1: Affect in Tweets
Saif Mohammad | Felipe Bravo-Marquez | Mohammad Salameh | Svetlana Kiritchenko
Proceedings of The 12th International Workshop on Semantic Evaluation

We present the SemEval-2018 Task 1: Affect in Tweets, which includes an array of subtasks on inferring the affectual state of a person from their tweet. For each task, we created labeled data from English, Arabic, and Spanish tweets. The individual tasks are: 1. emotion intensity regression, 2. emotion intensity ordinal classification, 3. valence (sentiment) regression, 4. valence ordinal classification, and 5. emotion classification. Seventy-five teams (about 200 team members) participated in the shared task. We summarize the methods, resources, and tools used by the participating teams, with a focus on the techniques and resources that are particularly useful. We also analyze systems for consistent bias towards a particular race or gender. The data is made freely available to further improve our understanding of how people convey emotions through language.

pdf bib
DeepMiner at SemEval-2018 Task 1: Emotion Intensity Recognition Using Deep Representation Learning
Habibeh Naderi | Behrouz Haji Soleimani | Saif Mohammad | Svetlana Kiritchenko | Stan Matwin
Proceedings of The 12th International Workshop on Semantic Evaluation

In this paper, we propose a regression system to infer the emotion intensity of a tweet. We develop a multi-aspect feature learning mechanism to capture the most discriminative semantic features of a tweet as well as the emotion information conveyed by each word in it. We combine six types of feature groups: (1) a tweet representation learned by an LSTM deep neural network on the training data, (2) a tweet representation learned by an LSTM network on a large corpus of tweets that contain emotion words (a distant supervision corpus), (3) word embeddings trained on the distant supervision corpus and averaged over all words in a tweet, (4) word and character n-grams, (5) features derived from various sentiment and emotion lexicons, and (6) other hand-crafted features. As part of the word embedding training, we also learn the distributed representations of multi-word expressions (MWEs) and negated forms of words. An SVR regressor is then trained over the full set of features. We evaluate the effectiveness of our ensemble feature sets on the SemEval-2018 Task 1 datasets and achieve a Pearson correlation of 72% on the task of tweet emotion intensity prediction.

pdf bib
Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems
Svetlana Kiritchenko | Saif Mohammad
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

Automatic machine learning systems can inadvertently accentuate and perpetuate inappropriate human biases. Past work on examining inappropriate biases has largely focused on just individual systems. Further, there is no benchmark dataset for examining inappropriate biases in systems. Here for the first time, we present the Equity Evaluation Corpus (EEC), which consists of 8,640 English sentences carefully chosen to tease out biases towards certain races and genders. We use the dataset to examine 219 automatic sentiment analysis systems that took part in a recent shared task, SemEval-2018 Task 1 ‘Affect in Tweets’. We find that several of the systems show statistically significant bias; that is, they consistently provide slightly higher sentiment intensity predictions for one race or one gender. We make the EEC freely available.

pdf bib
Agree or Disagree: Predicting Judgments on Nuanced Assertions
Michael Wojatzki | Torsten Zesch | Saif Mohammad | Svetlana Kiritchenko
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

Being able to predict whether people agree or disagree with an assertion (i.e. an explicit, self-contained statement) has several applications ranging from predicting how many people will like or dislike a social media post to classifying posts based on whether they are in accordance with a particular point of view. We formalize this as two NLP tasks: predicting judgments of (i) individuals and (ii) groups based on the text of the assertion and previous judgments. We evaluate a wide range of approaches on a crowdsourced data set containing over 100,000 judgments on over 2,000 assertions. We find that predicting individual judgments is a hard task with our best results only slightly exceeding a majority baseline, but that judgments of groups can be more reliably predicted using a Siamese neural network, which outperforms all other approaches by a wide margin.

pdf bib
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
Alexandra Balahur | Saif M. Mohammad | Veronique Hoste | Roman Klinger
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
IEST: WASSA-2018 Implicit Emotions Shared Task
Roman Klinger | Orphée De Clercq | Saif Mohammad | Alexandra Balahur
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Past shared tasks on emotions use data with both overt expressions of emotions (I am so happy to see you!) as well as subtle expressions where the emotions have to be inferred, for instance from event descriptions. Further, most datasets do not focus on the cause or the stimulus of the emotion. Here, for the first time, we propose a shared task where systems have to predict the emotions in a large automatically labeled dataset of tweets without access to words denoting emotions. Based on this intention, we call this the Implicit Emotion Shared Task (IEST) because the systems have to infer the emotion mostly from the context. Every tweet has an occurrence of an explicit emotion word that is masked. The tweets are collected in a manner such that they are likely to include a description of the cause of the emotion – the stimulus. Altogether, 30 teams submitted results which range from macro F1 scores of 21 % to 71 %. The baseline (Max-Ent bag of words and bigrams) obtains an F1 score of 60 % which was available to the participants during the development phase. A study with human annotators suggests that automatic methods outperform human predictions, possibly by honing into subtle textual clues not used by humans. Corpora, resources, and results are available at the shared task website at http://implicitemotions.wassa2018.com.

2017

pdf bib
Emotion Intensities in Tweets
Saif Mohammad | Felipe Bravo-Marquez
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

This paper examines the task of detecting intensity of emotion from text. We create the first datasets of tweets annotated for anger, fear, joy, and sadness intensities. We use a technique called best–worst scaling (BWS) that improves annotation consistency and obtains reliable fine-grained scores. We show that emotion-word hashtags often impact emotion intensity, usually conveying a more intense emotion. Finally, we create a benchmark regression system and conduct experiments to determine: which features are useful for detecting emotion intensity; and, the extent to which two emotions are similar in terms of how they manifest in language.

pdf bib
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
Steven Bethard | Marine Carpuat | Marianna Apidianaki | Saif M. Mohammad | Daniel Cer | David Jurgens
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

pdf bib
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
Alexandra Balahur | Saif M. Mohammad | Erik van der Goot
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
WASSA-2017 Shared Task on Emotion Intensity
Saif Mohammad | Felipe Bravo-Marquez
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

We present the first shared task on detecting the intensity of emotion felt by the speaker of a tweet. We create the first datasets of tweets annotated for anger, fear, joy, and sadness intensities using a technique called best–worst scaling (BWS). We show that the annotations lead to reliable fine-grained intensity scores (rankings of tweets by intensity). The data was partitioned into training, development, and test sets for the competition. Twenty-two teams participated in the shared task, with the best system obtaining a Pearson correlation of 0.747 with the gold intensity scores. We summarize the machine learning setups, resources, and tools used by the participating teams, with a focus on the techniques and resources that are particularly useful for the task. The emotion intensity dataset and the shared task are helping improve our understanding of how we convey more or less intense emotions through language.

pdf bib
Best-Worst Scaling More Reliable than Rating Scales: A Case Study on Sentiment Intensity Annotation
Svetlana Kiritchenko | Saif Mohammad
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Rating scales are a widely used method for data annotation; however, they present several challenges, such as difficulty in maintaining inter- and intra-annotator consistency. Best–worst scaling (BWS) is an alternative method of annotation that is claimed to produce high-quality annotations while keeping the required number of annotations similar to that of rating scales. However, the veracity of this claim has never been systematically established. Here for the first time, we set up an experiment that directly compares the rating scale method with BWS. We show that with the same total number of annotations, BWS produces significantly more reliable results than the rating scale.

2016

pdf bib
The Effect of Negators, Modals, and Degree Adverbs on Sentiment Composition
Svetlana Kiritchenko | Saif Mohammad
Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
A Practical Guide to Sentiment Annotation: Challenges and Solutions
Saif Mohammad
Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
Capturing Reliable Fine-Grained Sentiment Associations by Crowdsourcing and Best–Worst Scaling
Svetlana Kiritchenko | Saif M. Mohammad
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Sentiment Composition of Words with Opposing Polarities
Svetlana Kiritchenko | Saif M. Mohammad
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
SemEval-2016 Task 6: Detecting Stance in Tweets
Saif Mohammad | Svetlana Kiritchenko | Parinaz Sobhani | Xiaodan Zhu | Colin Cherry
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
SemEval-2016 Task 7: Determining Sentiment Intensity of English and Arabic Phrases
Svetlana Kiritchenko | Saif Mohammad | Mohammad Salameh
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
Metaphor as a Medium for Emotion: An Empirical Study
Saif Mohammad | Ekaterina Shutova | Peter Turney
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics

pdf bib
Detecting Stance in Tweets And Analyzing its Interaction with Sentiment
Parinaz Sobhani | Saif Mohammad | Svetlana Kiritchenko
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics

pdf bib
Sentiment Lexicons for Arabic Social Media
Saif Mohammad | Mohammad Salameh | Svetlana Kiritchenko
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Existing Arabic sentiment lexicons have low coverage―with only a few thousand entries. In this paper, we present several large sentiment lexicons that were automatically generated using two different methods: (1) by using distant supervision techniques on Arabic tweets, and (2) by translating English sentiment lexicons into Arabic using a freely available statistical machine translation system. We compare the usefulness of new and old sentiment lexicons in the downstream application of sentence-level sentiment analysis. Our baseline sentiment analysis system uses numerous surface form features. Nonetheless, the system benefits from using additional features drawn from sentiment lexicons. The best result is obtained using the automatically generated Dialectal Hashtag Lexicon and the Arabic translations of the NRC Emotion Lexicon (accuracy of 66.6%). Finally, we describe a qualitative study of the automatic translations of English sentiment lexicons into Arabic, which shows that about 88% of the automatically translated entries are valid for English as well. Close to 10% of the invalid entries are caused by gross mistranslations, close to 40% by translations into a related word, and about 50% by differences in how the word is used in Arabic.

pdf bib
Happy Accident: A Sentiment Composition Lexicon for Opposing Polarity Phrases
Svetlana Kiritchenko | Saif Mohammad
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Sentiment composition is the determining of sentiment of a multi-word linguistic unit, such as a phrase or a sentence, based on its constituents. We focus on sentiment composition in phrases formed by at least one positive and at least one negative word ― phrases like ‘happy accident’ and ‘best winter break’. We refer to such phrases as opposing polarity phrases. We manually annotate a collection of opposing polarity phrases and their constituent single words with real-valued sentiment intensity scores using a method known as Best―Worst Scaling. We show that the obtained annotations are consistent. We explore the entries in the lexicon for linguistic regularities that govern sentiment composition in opposing polarity phrases. Finally, we list the current and possible future applications of the lexicon.

pdf bib
A Dataset for Detecting Stance in Tweets
Saif Mohammad | Svetlana Kiritchenko | Parinaz Sobhani | Xiaodan Zhu | Colin Cherry
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We can often detect from a person’s utterances whether he/she is in favor of or against a given target entity (a product, topic, another person, etc.). Here for the first time we present a dataset of tweets annotated for whether the tweeter is in favor of or against pre-chosen targets of interest―their stance. The targets of interest may or may not be referred to in the tweets, and they may or may not be the target of opinion in the tweets. The data pertains to six targets of interest commonly known and debated in the United States. Apart from stance, the tweets are also annotated for whether the target of interest is the target of opinion in the tweet. The annotations were performed by crowdsourcing. Several techniques were employed to encourage high-quality annotations (for example, providing clear and simple instructions) and to identify and discard poor annotations (for example, using a small set of check questions annotated by the authors). This Stance Dataset, which was subsequently also annotated for sentiment, can be used to better understand the relationship between stance, sentiment, entity relationships, and textual inference.

2015

bib
Computational Analysis of Affect and Emotion in Language
Saif Mohammad | Cecilia Ovesdotter Alm
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

Computational linguistics has witnessed a surge of interest in approaches to emotion and affect analysis, tackling problems that extend beyond sentiment analysis in depth and complexity. This area involves basic emotions (such as joy, sadness, and fear) as well as any of the hundreds of other emotions humans are capable of (such as optimism, frustration, and guilt), expanding into affective conditions, experiences, and activities. Leveraging linguistic data for computational affect and emotion inference enables opportunities to address a range of affect-related tasks, problems, and non-invasive applications that capture aspects essential to the human condition and individuals’ cognitive processes. These efforts enable and facilitate human-centered computing experiences, as demonstrated by applications across clinical, socio-political, artistic, educational, and commercial domains. Efforts to computationally detect, characterize, and generate emotions or affect-related phenomena respond equally to technological needs for personalized, micro-level analytics and broad-coverage, macro-level inference, and they have involved both small and massive amounts of data.While this is an exciting area with numerous opportunities for members of the ACL community, a major obstacle is its intersection with other investigatory traditions, necessitating knowledge transfer. This tutorial comprehensively integrates relevant concepts and frameworks from linguistics, cognitive science, affective computing, and computational linguistics in order to equip researchers and practitioners with the adequate background and knowledge to work effectively on problems and tasks either directly involving, or benefiting from having an understanding of, affect and emotion analysis.There is a substantial body of work in traditional sentiment analysis focusing on positive and negative sentiment. This tutorial covers approaches and features that migrate well to affect analysis. We also discuss key differences from sentiment analysis, and their implications for analyzing affect and emotion.The tutorial begins with an introduction that highlights opportunities, key terminology, and interesting tasks and challenges (1). The body of the tutorial covers characteristics of emotive language use with emphasis on relevance for computational analysis (2); linguistic data—from conceptual analysis frameworks via useful existing resources to important annotation topics (3); computational approaches for lexical semantic emotion analysis (4); computational approaches for emotion and affect analysis in text (5); visualization methods (6); and a survey of application areas with affect-related problems (7). The tutorial concludes with an outline of future directions and a discussion with participants about the areas relevant to their respective tasks of interest (8).Besides attending the tutorial, tutorial participants receive electronic copies of tutorial slides, a complete reference list, as well as a categorized annotated bibliography that concentrates on seminal works, recent important publications, and other products and resources for researchers and developers.

pdf bib
Imagisaurus: An Interactive Visualizer of Valence and Emotion in the Roget’s Thesaurus
Saif Mohammad
Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
Sentiment after Translation: A Case-Study on Arabic Social Media Posts
Mohammad Salameh | Saif Mohammad | Svetlana Kiritchenko
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
SemEval-2015 Task 10: Sentiment Analysis in Twitter
Sara Rosenthal | Preslav Nakov | Svetlana Kiritchenko | Saif Mohammad | Alan Ritter | Veselin Stoyanov
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
NRC-Canada-2014: Detecting Aspects and Sentiment in Customer Reviews
Svetlana Kiritchenko | Xiaodan Zhu | Colin Cherry | Saif Mohammad
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
NRC-Canada-2014: Recent Improvements in the Sentiment Analysis of Tweets
Xiaodan Zhu | Svetlana Kiritchenko | Saif Mohammad
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

bib
Sentiment Analysis of Social Media Texts
Saif M. Mohammad | Xiaodan Zhu
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

Automatically detecting sentiment of product reviews, blogs, tweets, and SMS messages has attracted extensive interest from both the academia and industry. It has a number of applications, including: tracking sentiment towards products, movies, politicians, etc.; improving customer relation models; detecting happiness and well-being; and improving automatic dialogue systems. In this tutorial, we will describe how you can create a state-of-the-art sentiment analysis system, with a focus on social media posts.We begin with an introduction to sentiment analysis and its various forms: term level, message level, document level, and aspect level. We will describe how sentiment analysis systems are evaluated, especially through recent SemEval shared tasks: Sentiment Analysis of Twitter (SemEval-2013 Task 2, SemEval 2014-Task 9) and Aspect Based Sentiment Analysis (SemEval-2014 Task 4).We will give an overview of the best sentiment analysis systems at this point of time, including those that are conventional statistical systems as well as those using deep learning approaches. We will describe in detail the NRC-Canada systems, which were the overall best performing systems in all three SemEval competitions listed above. These are simple lexical- and sentiment-lexicon features based systems, which are relatively easy to re-implement.We will discuss features that had the most impact (those derived from sentiment lexicons and negation handling). We will present how large tweet-specific sentiment lexicons can be automatically generated and evaluated. We will also show how negation impacts sentiment differently depending on whether the scope of the negation is positive or negative. Finally, we will flesh out limitations of current approaches and promising future directions.

pdf bib
Generating Music from Literature
Hannah Davis | Saif Mohammad
Proceedings of the 3rd Workshop on Computational Linguistics for Literature (CLFL)

pdf bib
Words: Evaluative, Emotional, Colourful, Musical!
Saif Mohammad
Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
Semantic Role Labeling of Emotions in Tweets
Saif Mohammad | Xiaodan Zhu | Joel Martin
Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
An Empirical Study on the Effect of Negation Words on Sentiment
Xiaodan Zhu | Hongyu Guo | Saif Mohammad | Svetlana Kiritchenko
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf bib
NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets
Saif Mohammad | Svetlana Kiritchenko | Xiaodan Zhu
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
Computing Lexical Contrast
Saif M. Mohammad | Bonnie J. Dorr | Graeme Hirst | Peter D. Turney
Computational Linguistics, Volume 39, Issue 3 - September 2013

2012

pdf bib
#Emotional Tweets
Saif Mohammad
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
SemEval-2012 Task 2: Measuring Degrees of Relational Similarity
David Jurgens | Saif Mohammad | Peter Turney | Keith Holyoak
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
Portable Features for Classifying Emotional Text
Saif Mohammad
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2011

pdf bib
Colourful Language: Measuring Word-Colour Associations
Saif Mohammad
Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics

pdf bib
From Once Upon a Time to Happily Ever After: Tracking Emotions in Novels and Fairy Tales
Saif Mohammad
Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

pdf bib
Tracking Sentiment in Mail: How Genders Differ on Emotional Axes
Saif Mohammad | Tony Yang
Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011)

pdf bib
Even the Abstract have Color: Consensus in Word-Colour Associations
Saif Mohammad
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon
Saif Mohammad | Peter Turney
Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text

2009

pdf bib
Using Citations to Generate surveys of Scientific Paradigms
Saif Mohammad | Bonnie Dorr | Melissa Egan | Ahmed Hassan | Pradeep Muthukrishan | Vahed Qazvinian | Dragomir Radev | David Zajic
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus
Saif Mohammad | Cody Dunne | Bonnie Dorr
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Estimating Semantic Distance Using Soft Semantic Constraints in Knowledge-Source – Corpus Hybrid Models
Yuval Marton | Saif Mohammad | Philip Resnik
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf bib
Computing Word-Pair Antonymy
Saif Mohammad | Bonnie Dorr | Graeme Hirst
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf bib
Tor, TorMd: Distributional Profiles of Concepts for Unsupervised Word Sense Disambiguation
Saif Mohammad | Graeme Hirst | Philip Resnik
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
Cross-Lingual Distributional Profiles of Concepts for Measuring Semantic Distance
Saif Mohammad | Iryna Gurevych | Graeme Hirst | Torsten Zesch
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Determining Word Sense Dominance Using a Thesaurus
Saif Mohammad | Graeme Hirst
11th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Distributional measures of concept-distance: A task-oriented evaluation
Saif Mohammad | Graeme Hirst
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

2004

pdf bib
Complementarity of lexical and simple syntactic features: The SyntaLex approach to Senseval-3
Saif Mohammad | Ted Pedersen
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf bib
Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation
Saif Mohammad | Ted Pedersen
Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004