The Role of Adverbs in Sentiment Analysis

Sentiment Analysis, an important area of Natural Language Understanding, often relies on the assumption that lexemes carry inherent sentiment values, as reﬂected in specialized resources. We examine and measure the contribution that eight intensifying adverbs make to the sentiment value of sentences, as judged by human anno-tators. Our results show, ﬁrst, that the intensifying adverbs are not themselves sentiment-laden but strengthen the sentiment conveyed by words in their contexts to different degrees. We consider the consequences for appropriate modiﬁcations of the representation of the adverbs in sentiment lexicons.


Introduction
It was probably Chuck who coined the term "armchair linguist" (Svartvik, 1991). Chuck Fillmore's deep commitment to the study of language -in particular lexical semantics -on the basis of corpus data served as a model that kept many of us honest in our investigation of language. Today, we are lucky to be able to work from our office chairs while collecting data from a broad speaker group by means of crowdsourcing. And Chuck's FrameNet taught us the importance of considering word meanings in their contexts. Our paper presents work that tries to take this legacy to heart.

Sentiment Analysis
Broadly speaking, sentiment analysis (SA) attempts to automatically derive a writer's "sentiment" about the topic of a text. "Sentiment" is usually categorized into "positive," "neutral" and "negative," where positive corresponds to satisfaction or happiness and "negative" to dissatisfaction or unhappiness. Some work in SA further distinguishes degrees of positive and negative sentiment. SA often refers to lexical resources where words are annotated with a sentiment value. Sen-tiWordNet (SWN) (Esuli and Sebastiani, 2006) assigns one of three sentiment values to each synset in WordNet (Fellbaum, 1998). Opinion Finder (OF) (Wilson et al., 2005) identifies the sentiment of the writer. Other resources include Appraisal Lexicon (AL) (Taboada and Grieve, 2004) and Micro-WNOp (Cerini et al., 2007).
Much of this work relies on the assumption that specific lexemes (unique mappings of word forms and word meanings) carry an inherent sentiment value. This seems intuitively correct for words like enjoy (positive), pencil (neutral) and pain (negative).
Other words may not carry inherent sentiment value yet, in context, contribute to that of the words they co-occur with or modify. One such class of words comprises what we call polarity intensifiers. In this preliminary study, we analyze the contribution of adverbial intensifiers to the sentiment value of the sentences in which they occur.
Consider the adverb absolutely in two sample sentences from movie reviews: S1 He and Leonora have absolutely no chemistry on screen whatsoever.
S2 I was absolutely delighted by the simple story and amazing animation.
The goal of this preliminary experimental study is to seek answers to the following questions

The Experiment
We analyze whether human judgments show an effect on the sentiment ratings of sentences in the presence or absence of selected adverbs, and how strong the effect of each adverb is. Let S1' be the sentence S1 from which an adverb like absolutely is removed. S2' is defined similarly. Three main observations can be made: (1) the adverb appears in both positive and negative sentiment-bearing sentences (S1 is negative and S2 is positive); (2) its removal from either S1 or S2 does not change the overall polarity of the sentence; (3) intuitively, S1 has a stronger negative polarity value than S1' and S2 has a stronger positive polarity value than S2'. We conduct a preliminary study of polarity intensifier words and show that they all have characteristics (1) -(3). We examine data with eight different adverbs (Table 1).

Data
We extracted sentences containing the target adverbs from a corpus of 50,000 movie reviews (Maas et al., 2011). Each sentence is extracted from a review that is labeled either "positive" or "negative" and correlated with a star rating. We manually inspected the sentences and discarded those where the target adverb was used in a modal sense, as in Seriously, there was not one respectable character in the entire script while retaining sentences like There is no doubt that Alfred Hitchcock was a seriously talented director. For each adverb, we retained ten sentences from positive and negative reviews each, for a total 160 sentences. We copied the original sentences, removed the adverbs without making additional alterations. Our final dataset consisted of a total of 320 sentences with 160 sentence pairs whose members were identical except for the presence or absence of the target adverbs. Below is an example of a sentence pair, where the original sentence with the adverbs was pre-classified by (Pang and Lee, 2004) as carrying positive sentiment.
1. I was absolutely delighted by the simple story and amazing animation.
2. I was delighted by the simple story and amazing animation.

Collecting Judgments via Crowdsourcing
We submitted single sentences (not pairs) to be annotated with sentiment scores for crowdsourcing, using Amazon Mechanical Turk (AMT). To avoid any bias we shuffled the sentences and displayed them individually. We asked the Turkers to select, for each sentence, one of five sentiment scores: strong positive (2), positive (1), neutral (0), negative (-1), strong negative (-2). Each sentence was rated by five annotators. Altogether, twenty annotators completed the task within eight hours. Since the annotators did not all judge the same set of sentences, we computed the agreement between annotators as follows. For each annotator, his/her agreement with the others is given be the follow- where S(i) is the set of sentences annotated by the i th Turker and ps ji is the percentage of Turkers who have the same annotation with the i th Turker for sentence j. |S(i)| is the cardinality of set S(i). The agreement ranges from 0.52 to 0.8. Although the annotation of some Turkers is close to that of flipping a coin, all judgments were retained and included in the results reported here.

Results
We report the main results. The polarity rating of a sentence j is the (un-weighted) average rating  of the five annotators for the sentence, denoted α j and α j = ∑ i ps ji . We use uniform weighting. A sentence j is classified into one of the five polarity categories according to the following criteria:

Do Adverbs Change Sentiment Rating?
We first examine the polarity intensifying effects of the eight adverbs and determine their relative intensifying effects. For each adverb we compute the average polarity rating change between the members of the 20 sentence pairs with and without the target adverb. The second column of Table  2 shows the average polarity rating change for the adverbs. All adverbs have polarity intensifying effect, which ranges from 0.2 to 0.6. Awfully and seriously have the strongest effect.

Change of Sentiment Rating in Positive vs. Negative Contexts
Next we ask whether the adverbs have a stronger polarity intensifying effect on sentences with a negative, positive or neutral ratings. We partition the 20 sentences with/without each adverb into the three polarity categories according to their average polarity ratings. A sentence j is negative (positive) if α j ≤ −0.5 (α j ≥ 0.5). Figure 1 shows the results. For six out of the eight adverbs, the graph follows a V-shaped pattern, indicating that the adverbs have stronger polarity influence on sentences conveying opinionated, but not neutral, statements. Pretty shows the weakest effect across, which makes intuitive sense, as this adverb seems to have a "softening/weakening" effect: consider "pretty good," which one could judge to be slightly less good than "good." For example, the sentence He has a pretty strident rant about how important it is.
received an average rating score of 0 with the adverb present and -0.2 without it. The results for awfully and extremely are surprising. A closer look at the annotations revealed some possible unreliable ratings. For example, the sentence The part of the movie set in Vietnam was extremely inaccurate.
has average polarity score of 0 (i.e., neutral) with the adverb and -0.8 without. Intuitively, it seems that the first sentence conveys a strong negative sentiment. Such data indicate the need for further study. A more complex scheme for computing the average polarity scores, such as weighted by inter-annotator agreement, might produce better results.

Can Adverbs Reverse Sentiment
Orientation? We ask whether their presence can have the effect of reversing the polarity of a sentence. We again consider three sentiment categories: positive, negative and neutral. The third column in Table 2 shows for each adverb, how many sentences out of the total of 20 were judged to have a reversed polarity when the adverb was removed. Overall, the polarities of only 13 out of 160 sentences (i.e., about 8%) change.

Do Adverbs Have an Inherent
Sentiment Value? Our target adverbs have inherent polarity as claimed in some sentiment lexicons (see Table 1).
If the polarity of a sentence does not change when the adverbs is present or absent, we conclude that the adverb has no inherent polarity but may merely affect the intensity of the constituents that it modifies. These results, as displayed in Figure 1 indicate that our target adverbs do not carry inherent polarity. Instead, they modify the intensity of the sentiment connoted by the context.

Discussion
We examined the effect of eight intensifying adverbs on the sentiment ratings of the sentences in which they occur. Our study showed that, contrary to their representation in some widely used sentiment lexicons, these adverbs do not carry an inherent sentiment polarity, but merely alter the degree of the polarity of the constituents they modify; corrections of the corresponding entries in the sentiment resources seem warranted. Our results show further that all adverbs strengthen the polarity of the context to different degrees. If confirmed on a larger data set, this indicates that the intensifying force of different adverbs should be reflected in lexical resources, perhaps along an ordered scale.

Related Work
Two recent surveys give a detailed account of the SL acquisition techniques (Feldman, 2013;Liu, 2012). We give only an overview of the related work here. SLs are acquired by one of three methods. Manual tagging is performed by human annotators: e.g., OF, and AL. Dictionary-based acquisition relies on a set of seed words that is expanded by using external resources, such as Word-Net: e.g., (Dragut et al., 2010;Hassan and Radev, 2010;Mohammad et al., 2009;Dragut et al., 2012;Takamura et al., 2005). In corpus-based acquisition a set of seed words is expanded by using a large corpus of documents (Feng et al., 2013;Lu et al., 2011;Yu et al., 2013;Wu and Wen, 2010).
To our knowledge, none of these works include the polarity intensifiers that we introduce in this paper.