Implicit Sentiment Analysis with Event-centered Text Representation

Implicit sentiment analysis, aiming at detecting the sentiment of a sentence without sentiment words, has become an attractive research topic in recent years. In this paper, we focus on event-centric implicit sentiment analysis that utilizes the sentiment-aware event contained in a sentence to infer its sentiment polarity. Most existing methods in implicit sentiment analysis simply view noun phrases or entities in text as events or indirectly model events with sophisticated models. Since events often trigger sentiments in sentences, we argue that this task would benefit from explicit modeling of events and event representation learning. To this end, we represent an event as the combination of its event type and the event triplet . Based on such event representation, we further propose a novel model with hierarchical tensor-based composition mechanism to detect sentiment in text. In addition, we present a dataset for event-centric implicit sentiment analysis where each sentence is labeled with the event representation described above. Experimental results on our constructed dataset and an existing benchmark dataset show the effectiveness of the proposed approach.


Introduction
Sentiment analysis aims at automatically detecting the sentiment of given text. Explicit sentiment analysis methods detect the sentiment mainly based on the occurrence of sentiment-related words and have been extensively explored (Agarwal et al., 2011;Tang et al., 2014Tang et al., , 2015a. However, sentiment could also be implicitly expressed. For example, the sentence 'I won the first place in the speech contest' does not contain any sentiment words, but the event of 'winning the first place' reflects the positive sentiment. Implicit sentiment analysis was ⇤ Corresponding author. 1 https://github.com/FloatingIsland2/ Implicit-Sentiment-Analysis thus proposed to detect sentiment in the absence of sentiment words (Liu, 2012). Compared with explicit sentiment analysis, implicit sentiment analysis is more challenging as there is a lack of explicit cues for inferring the sentiment polarity.
Research on implicit sentiment analysis can be broadly classified into two categories: metaphorbased and event-centric. Metaphor/rhetoric-based implicit sentiment analysis methods typically detect sentiment based on a metaphoric sentiment dictionary and some manually designed rules (Zhang and Liu, 2011), while event-centric approaches assume sentiment is triggered by events described in sentences. Examples include an early approach which regarded noun phrases or entities in sentences as events (Greene and Resnik, 2009), and more recent approaches which indirectly model events by capturing contextual information using graph convolutional networks (Zuo et al., 2020) or attention mechanism (Wei et al., 2020). We argue that in the former, the event representation is oversimplified and consequently event-related knowledge would be largely lost; while in the latter, events are not directly modeled which makes it less effective in detecting sentiments triggered by events.
To overcome the limitations of existing eventcentric approaches, we propose to construct a corpus in which event triplets in the form of <subject, predicate, object> and their corresponding types are annotated in sentences. In addition, each sentence is also assigned with a sentiment class label. An example sentence and its annotation are shown below: 'You abandon me for a week to go off on holiday with daddy, come back and barely 2 days later you go off out with him again.' Event triplet: <you, abandon, me> Event type: abandonment Sentiment: negative The event triplet and the event type are combined as the final representation of an event. Based on such event-centered text representation, we further propose a method for implicit sentiment analysis built on hierarchical tensor-based compositions, which effectively employs Tensor Composition (Socher et al., 2013;Weber et al., 2017) to encode the interaction between the subject, the predicate, the object and the sentence. Moreover, we adopt a multi-task learning framework to perform event type classification and sentiment classification simultaneously. Our experimental results show that event type classification benefits sentiment classification.
In summary, our contributions are as follows: • We propose a novel model with hierarchical tensor-based compositions to detect sentiment based on event-centered text representations to explicitly model events and capture the interaction between the subject, the predicate, the object and the sentence.
• We further develop a multi-task learning framework to improve sentiment analysis with event type classification.
• We present a dataset, called EveSA, with annotated event triplets, event types, and sentence-level sentiment polarity labels for implicit sentiment analysis.

Related Work
Implicit Sentiment Analysis Liu (2012) first classifies sentiment analysis into explicit and implicit sentiment analysis. Generally speaking, implicit sentiment analysis can be further classified into metaphor-based and event-centric approaches.
In metaphor-based approaches, sentences containing keywords found in a metaphor dictionary are considered implicitly expressing positive or negative emotional tendencies (Zhang and Liu, 2011). In event-centric approaches, events mentioned in text may imply positive or negative sentiments. Balahur et al. (2011) presented an approach for detecting event-triggered sentiment based on commonsense knowledge, EmotiNet, a knowledge base of concepts with associated affective value. Greene and Resnik (2009) used grammatical structures to mine language features related to implicit sentiments, and used similarity calculations to improve the performance of sentiment classification. Zuo et al. (2020) proposed a context-specific heterogeneous graph convolutional network to address the problem of the absence of sentiment words.

Event-related Sentiment Analysis
In recent years, researchers also pay attention to the importance of event information in sentiment analysis. Deng and Wiebe (2015) encoded a set of sentiment inference rules in a probabilistic soft logic framework for entity/event-level sentiment analysis tasks. Hofmann et al. (2020) encoded properties of events as latent variables following theories of cognitive appraisal of events to improve emotion classification performance. Gaonkar et al. (2020) tracked label-label correlations through label embeddings in sentiment classification tasks to maintain the consistency of emotion caused by the same type of event. Ding and Riloff (2016) first defined affective events as triples <subjective, verb, objective>, and created a dataset containing affective events with sentiment polarity labels. Ding and Riloff (2018b)

Dataset: EveSA
Since the existing sentiment analysis datasets only contain text and sentiment polarity labels, but no event-related annotations, we construct an eventcentered dataset for implicit sentiment analysis. We first identify event types from FrameNet (Baker et al., 1998) and then crawl tweets which contain the triggering words of the corresponding event types.

Construction of an Event Type Library
FrameNet (Baker et al., 1998) is an English vocabulary knowledge base that contains more than 1200 semantic frames (each frame can be regarded as an event type) and lexical units (each lexical units can be regarded as a predicate), and more than 200,000 labeled sentences under all frames, which can be used for learning models for NLP tasks such as information extraction and event detection. With the FrameNet knowledge base, we build an event library consisting of event-related components: event types, argument roles (subjects and objects), and event triggers (predicate). Since not all frames contained in FrameNet are suitable as event types or have obvious sentiment inclinations, we define the filtering rules as follows: • Filter frames that do not contain the subject and object argument roles; • Remove frames that contain more than two arguments to control the complexity of the events; • Artificially remove some frames that do not have sentiments to ensure the selected event types are balanced in the three sentiment categories (positive, negative and neutral).
According to these rules, 18 frames are selected out of a total of 1200 frames in FrameNet as our identified event types, listed in Table 1. Finally, we build a dictionary of trigger words for each event type by only keeping the verb and other lexical units with no ambiguous word senses as the predicates of each event type.

Data Collection and Cleaning
We employ tweet_scrapper 2 to crawl tweets containing at least one of the trigger words of the event types identified in Section 3.1. To clean the data, first, the Python package preprocessor 3 is used to remove the URLs, HashTags, and various emojis in text. Next, HTTP tokens, other special tokens, non-English characters, and consecutively repeated characters are removed from text. Afterwards, each tweet is processed using CoreNLP 4 to obtain the word segmentation results. Finally, tweets with more than 40 words are filtered.

Annotation of Event Triplets and Sentiment Polarity
Given the word segmentation result of each text sen = {w 1 , ..., w i ..., w n }, in which n is the sentence length. Two annotators first verify the word segmentation results. Next, the annotators assign the event type etype, and mark the position of the predicate or the event trigger words in the sentence. Afterwards, the arguments are annotated for each event. The jth argument role under the event type etype has the following annotations: where m represents the number of argument roles under this etype. Each sentence is also annotated with event triplets, <subject, predicate, object> and one of the sentiment categories, positive, neutral or negative. The independently annotated results are compared and any inconsistent annotations are resolved through discussions. The final dataset contains 18 event types and 3,981 sentences, each with an annotated event triplet. Detailed data set statistics are shown in Table 1.

Problem Setting
Given (1) a sentence sen = {w 1 , w 2 , . . . w n } associated with a sentiment category; (2) its event triplet < subject: subj = w 1 , w 2 , . . . w n subj , predicate pred = w 1 , w 2 , . . . w n pred , object obj = w 1 , w 2 , . . . w n obj >; and (3) its corresponding event type e type , the task of fact-based implicit sentiment analysis model is to predict the sentiment distribution over three sentiment polari-ties. We propose an event-centered text representation model to solve this problem.

Overview
Traditional sentence-level sentiment analysis infers the sentiment category of a sentence from text directly. We instead propose to detect the sentiment category of a sentence based on its associated event, the triplet <subject, predicate, object> and the event-related information apart from the textual information of the sentence, as depicted in Figure 1. our event-centered approach in which sentiment is detected based on the event triplet, the event type in addition to the textual information of the sentence. Z is the noise introduced to increase sentence diversity.
The overall architecture of our proposed method is presented in Figure 2. It consists of three parts. (1) To integrate the event triplet <subject, predicate, object> with the sentence, hierarchical tensor-based compositions are employed. Both event triplet and the sentence are first fed into a BERT encoder (Devlin et al., 2018). Then the bottom-level tensor composition (Weber et al., 2017) is used to obtain the event representation e, which is further combined with the BERT-encoded sentence representation by the top-level tensor composition to generate the output r final .
(2) To model the event type information, a multi-task learning framework is used to perform event type classification and sentiment classification simultaneously, since in our preliminary experiments, accurate classification of event types can benefit the sentiment classification task. (3) Since the same event could be described in sentences with diverse surface forms, we assume the BERT-encoded sentence representation h sen follows a Gaussian distribution with its mean determined by the event representation, e, i.e., h sen ⇠ N e, 2 . Such a constraint is added to the loss function to make the sentence representation in the embedding space to be close to the learned event representation.

Hierarchical Tensor-based Compositions
The event triplet input to the BERT encoder is: [CLS] subject [SEP] predicate [SEP] object [SEP]. The hidden states of these three parts are H subj 2 R L subj ⇥d , H pred 2 R L pred ⇥d and H obj 2 R L obj ⇥d , where L subj , L pred and L obj are the maximum sequence length of the subject, predicate and the object, respectively, and d is the dimension of the BERT hidden states. After averaging the constituent token embeddings of each event element separately, the hidden states of the three parts are h subj 2 R d , h pred 2 R d and h obj 2 R d , which are then fed into Tensor-based Compositions to model the relationship among the subject, the predicate, and the object.
First, the interaction between the subject and the predicate, r s_p 2 R d , is computed as follows: where T h pred is a vector r 2 R k , and each entry is computed by one slice of h pred , (i = 1, · · · , k). Other parameters are trainable in a standard feedforward neural network, where W 2 R k⇥2d is the weight matrix, b 2 R k⇥k is the bias vector. The calculation diagram of Tensor Composition is shown in Figure 3: Similarly, to compute the interaction between the predicate and the object, r p_o 2 R d is computed by: Finally, the interaction between the subject and the object is calculated by r s_p and r p_o of the second layer in Tensor-based Compositions, with its output regarded as the event representation e: The sentence h sen 2 R d is encoded with BERT. To better fuse the event representation e with the contextual information in the sentence, we model their interaction in the same way, using Tensorbased Composition, to derive the final representation r final :

Multi-task Learning with Event Type Classification
Since each sentence in our dataset is annotated with an event type and a sentiment polarity category, we use a multi-task learning framework to make use of event type information in sentiment classification. Two tasks, event type classification and sentiment classification, share the same encoder and hierarchical Tensor-based Compositions. The event type classifier applies e to generate the event distribution, while the sentiment classifier uses the final representation r final to compute the sentiment distribution. The event type classification loss L e is defined as: where N and E denote the total number of training examples and event types respectively, p (c e | x i ) is the predicted probability of the i-th example x i belonging to class c e , and y e i is the ground truth label.
The sentiment classification loss L s is defined in a similar way except that K denotes the total number of sentiment categories, which is 3 in our dataset: The multi-task framework is trained by minimizing the cross entropy loss of event classification L e and sentiment classification L s .

Constraint of Sentence-Event Similarity
Since the same event can be described in sentences with different surface forms, we assume that the sentence representation follows a Gaussian distribution with its mean determined by the event representation: h sen ⇠ N e, 2 , where the variance can be computed based on the output of hierarchical Tensor-based Compositions.
We design a loss function to encourage the sentence representation to be cloase to the event representation, based on the probability density function of the Gaussian distribution: where log 2 t is a variance regularization term that prevents the module from predicting too large variance .
Finally, the combined loss function is defined as follows: where e , s , and sim are hyperparameters controlling the relative contribution of the respective loss term.

Datasets and Evaluations
To evaluate our proposed approach, we conducted experiments on the following two corpora: EveSA, the dataset we presented in Section 3 for Eventdriven Sentiment Analysis, and SemEval17 Task4 (Rosenthal et al., 2017). The SemEval-2017 Task 4 Subtask A aims to identify the overall sentiment of a tweet. The data was first cleaned by removing the hashtags, user mentions, irregular phrases and abbreviations. The original dataset does not have the annotations of event triplets. Existing event extraction models are mainly trained on the ACE 2005 dataset (Grishman et al., 2005), where triggers and arguments of news events are quite different from that of personal social events mentioned in the dataset. Therefore, we used a Semantic Role Labeling model (He et al., 2017) to extract a predicate from each sentence and mark its subject and object manually. We manually checked more than 20,000 tweets in this dataset, and retained 7,117 tweets which contain the full event triplets <subject, predicate, object>.

Category Dataset
EveSA SemEval17 Task4   Positive  1205  3247  Neutral  624  2001  Negative  2152  1869 Total 3981 7117 The statistics of the two corpora used in our experiments are shown in Table 2. It should be noted that since we use the SRL model instead of event extraction model to annotate the SemEval 17 Task 4 dataset, there are no event types annotated. Thus the multi-task learning module is skipped in the experiments on this dataset.
For both EveSA and SemEval17 Task4, we employ accuracy and weighted-F1 as evaluation metrics. The weighted-F1 is computed by: 10) where N is the total number of class categories, and weight i represents the weight of the i-th class category, which is the distribution of the i-th class in the training set.

Implementation Details
We implement the models in Pytorch 1.4.0. The BERT encoder is fine-tuned in the training process. The dimension of the hidden states is 768 and the batch size is 32. Number of matrices in tensor compositions are set to 100. The number of epochs is 5. L e , L s and L sim are set to 5, 1, and 1, respectively. The loss function is minimized using Adam optimizer (Kingma and Ba, 2014) with a learning rate of 1e 5 and a dropout rate of 0.1. All the parameters are chosen based on a validation set which is 20% of their respective training set.

Compared Methods
Since our model takes both event and sentence as input, we compare the proposed model with baselines taking three kinds of inputs: Sentence as input Models in this category take a tweet (i.e., a sentence) as input: BiLTSM (Schuster and Paliwal, 1997): A bidirectional LSTM neural network with GloVe embeddings (Pennington et al., 2014) for word sequence. BERT (Devlin et al., 2018): The state-of-the-art model for sentiment classification. We limit the sentence length to the last 40 tokens to allow a larger batch size. We use BERT-base due to the memory limit of our GPU. BiLSTM+ orthogonalAtt (Wei et al., 2020): A recently proposed implicit sentiment analysis model with bidirectional LSTM neural network with BERT encoder and orthogonal attention.
Event triplet as input Models in the category take the event triplet in the form of <subject, predicate, object> as input: The first two baselines, BiLSTM and BERT, take the concatenated words in subject, predicate and object as an input sequence. NTN (Weber et al., 2017): A neural network with tensor-based compositions with subject, predicate and object in event triplet as inputs. We use GloVe embeddings (Pennington et al., 2014) for the word sequence.
BERTNTN: An NTN model with BERT as an encoder, using fine-tuned BERT to get hidden states of the word sequence.
Sentence and event triplet as input Models in this category consider both the input sentence and the extracted event triplet: BERTNTN + Att (Vaswani et al., 2017): Built upon BERTNTN, the sentence is also encoded   (Tang et al., 2015b): Similar to BERTNTN + Att, each event type has its randomly initialized event embedding, similar to deriving user embeddings. Due to the lack of event type labels in Se-mEval17 Task4, we are unable to conduct experiments using models making use of event types.
Experimental results on two corpora are shown in Table 3. It can be observed that: (1) The proposed model performs remarkably better than other baselines across all input categories and on all evaluation metrics, including a recently proposed implicit sentiment analysis model, which verifies the effectiveness of our model in capturing events that evoke sentiments in sentences.
(2) Event representation in the form of event triplet leads to improved performance in sentiment classification, as evidenced by the lower performance of BERT (using sentence as the only input) compared to that of BERTNTN + Att and BERTNTN sen-tence (encoding both sentence and event triplet). It should be noted that models using event triplets as the only input, such as BERT (using event triplet concatenation as input) and NTN give inferior results because the sentence contextual information is not modeled. (3) Event type, the extra annotated information in our dataset EveSA, plays a vital role in joint sentiment and event type prediction, since HTC + MTL+ Sen-Event outperforms HTC + Sen-Event.

Ablation Study
To validate the effectiveness of components in our approach, we performed ablation experiments on two corpora, and showed the ablation results in Table 4 and Table 5, respectively. BERT only takes the sentence input, while BERTNTN only uses the event triplets as input. Event sentence concatenates the outputs of BERT and BERTNTN for classification. HTC, MTL and Sen_Event refer to hierarchical tensorbased compositions, multi-task learning with event type classification and the constraint of sentenceevent similarity component, respectively. The full model are HTC + MTL + Sen_Event for EveSA, and HTC + Sen_Event for SemEval17 Task4.
It can be observed that using event triplets as input only (BERTNTN) gives the worst results. BERT trained on tweets directly improves upon BERTNTN quite substantially. Simply combining outputs from BERT and BERTNTN performs worse than BERT. Using the hierarchical tensor compositions (HTC) is more effective in encod- Figure 4: Case study of example classification outputs by BERT and our model. The event information including event triplet and event type is also shown. Instances incorrectly predicted by BERT but correctly predicted by our approach are selected. The last column, "BERT ! Ours", shows the prediction of BERT vs. the prediction of our approach.   ing both sentence contextual information and event triplets, outperforming BERT. Multi-task learning further improves the performance of HTC. Finally, the combination of all three components (or the two components without MTL for SemEval 2017) described Section 4 achieves the best results on the datasets.

Case Study
To further analyze whether the event information is the key factor to evoke sentiment, and to provide interpretable results for the prediction of our event-centered text representation model, we compare the classification results of our proposed method with the results of BERT through a case study. The former employs both event and sentence as features, while the latter achieves state-of-the-art in sentence-level text classification without considering events which trigger sentiments expressed in text. Concrete descriptions of cases are shown in Figure 4 in which we show the instances that BERT fails to predict while ours gives correct results.
It can be seen from the examples [S1] to [S4] that in the absence of sentiment words, BERT tends to classify sentences as neutral. On the contrary, by utilizing event representations and jointly performing event type classification and sentiment analysis, our model benefits from emotional semantics contained in event triplets such as <Most of what the government does, serving, the public's will > and the event type Assistance, and achieves better results in such sentences with implicitly expressed emotions.
[S5] and [S6] are examples misjudged by BERT as negative but are correctly classified by our method. It can be observed that the event information is indeed helpful in implicit sentiment analysis.

Conclusion
In this paper, we propose a novel approach for factbased implicit sentiment analysis, mainly for the situation where a sentence contains no sentiment words but only event descriptions. Our model employs event information, including the event triplets and event types as features, and detects sentiment based on event representations learned with hierarchical tensor-based compositions. Moreover, we present a dataset with event annotations for implicit sentiment analysis. Experimental analysis demonstrates that both event triplets and event type benefit implicit sentiment classification. Our current approach assumes that events have already been extracted from text. Future research will explore automated event extraction in order to perform endto-end implicit sentiment analysis.