Learning Implicit Sentiment in Aspect-based Sentiment Analysis with Supervised Contrastive Pre-Training

Aspect-based sentiment analysis aims to identify the sentiment polarity of a specific aspect in product reviews. We notice that about 30% of reviews do not contain obvious opinion words, but still convey clear human-aware sentiment orientation, which is known as implicit sentiment. However, recent neural network-based approaches paid little attention to implicit sentiment entailed in the reviews. To overcome this issue, we adopt Supervised Contrastive Pre-training on large-scale sentiment-annotated corpora retrieved from in-domain language resources. By aligning the representation of implicit sentiment expressions to those with the same sentiment label, the pre-training process leads to better capture of both implicit and explicit sentiment orientation towards aspects in reviews. Experimental results show that our method achieves state-of-the-art performance on SemEval2014 benchmarks, and comprehensive analysis validates its effectiveness on learning implicit sentiment.


Introduction
Aspect-level sentiment analysis (ABSA) is a finegrained variant aiming to identify the sentiment polarity of one or more mentioned aspects in product reviews. Recent studies tackle the task by either employing attention mechanisms (Wang et al., 2016b;Ma et al., 2017) or incorporating syntax-aware graph structures (He et al., 2018;Tang et al., 2020;Sun et al., 2019;. Both methodologies aim to capture the corresponding sentiment expression towards a particular aspect, which is usually an opinion word that explicitly expresses sentiment polarity. For instance, given the review on a restaurant "Great food but the service is dreadful", current models attempt to find "great" for aspect "food" to determine the positive sentiment polarity towards it. * * Corresponding author.

Reviews contain implicit sentiment
The waiter poured water on my hand and walked away The bartender continued to pour champagne from his reserve 10 hours of battery life ... The battery life is probably an hour Table 1: Examples of reviews contain implicit sentiment where aspects are marked to bold. In the above examples, "pour" expresses opposite emotions in different contexts. In the below examples, people determine the sentiment orientations towards "battery" by referring to a common lifetime.
However, implicit sentiment expressions widely exist in the recognition of aspect-based sentiment. Implicit sentiment expressions indicate sentiment expressions that contain no polarity markers but still convey clear human-aware sentiment polarity in context (Russo et al., 2015). As illustrated in Table 1, the comment "The waiter poured water on my hand and walked away" towards aspect "waiter" contains no opinion words, but can be clearly interpreted to be negative. According to Table 2 (as seen in Section 4), 27.47% and 30.09% of reviews contain implicit sentiment among Restaurant and Laptop datasets. However, most of the previous methods generally pay little attention on modeling implicit sentiment expressions. This motivates us to better solve the task of ABSA by capturing implicit sentiment in an advanced way.
To equip current models with the ability to capture implicit sentiment, inadequate ABSA datasets are the main challenge. With only a few thousand labeled data, models could hardly recognize comprehensive patterns of sentiment expressions, and are unable to capture enough commonsense knowledge, which is required in sentiment identification. It reveals that external sentiment knowledge should be introduced to solve the problem. Therefore, we adopt Supervised ContrAstive Pre-Training(SCAPT) on external large-scale sentiment-annotated corpora to learn sentiment knowledge. Supervised contrastive learning gives an aligned representation of sentiment expressions with the same sentiment label. In embedding space, explicit and implicit sentiment expressions with the same sentiment orientation are pulled together, and those with different sentiment labels are pushed apart. Considering the sentiment annotations of retrieved corpora are noisy, supervised contrastive learning enhances noise immunity of the pretraining process. Also, SCAPT contains review reconstruction and masked aspect predication objectives. The former requires representation encoding review context besides sentiment polarity, and the latter adds the model's ability to capture the sentiment target. Overall, the pre-training process captures both implicit and explicit sentiment orientation towards aspects in reviews.
Experimental evaluations conducted on SemEval-2014 (Pontiki et al., 2014) and MAMS  datasets show that proposed SCAPT outperforms baseline models by a large margin. The results on partitioned datasets demonstrate the effectiveness of both implicit sentiment expression and explicit sentiment expression. Moreover, the ablation study verifies that SCAPT efficiently learns implicit sentiment expression on the external noisy corpora. Codes and datasets are publicly available 1 .
The contributions of this work include: • We reveal that ABSA was only marginally tackled by previous studies since they paid little attention to implicit sentiment. • We propose Supervised Contrastive Pre-training to learn sentiment knowledge from large-scale sentiment-annotated corpora. • Experimental results show that our proposed model achieves state-of-the-art performance, and is effective to learn implicit sentiment.

Implicit Sentiment
As sentiment that can only be inferred within the context of reviews, many researches address the presence of implicit sentiment in sentiment analysis. Toprak et al. (2010); Russo et al. (2015) proposed similar terminologies (as implicit polarity or polar facts), and provided corpora containing implicit sentiment. Deng and Wiebe (2014) detected implicit sentiment via inference over explicit sentiment expressions and so-called goodFor/badFor events. Choi and Wiebe (2014) used +/-EffectWordNet lexicon to identify implicit sentiment, by assuming sentiment expressions are often related to states and events which have positive/negative/null effects on entities.
To investigate the ubiquitous of implicit sentiment in ABSA, we split SemEval-2014 Restaurant and Laptop benchmarks into Explicit Sentiment Expression (ESE) slice and Implicit Sentiment Expression (ISE) slice, based on the presence of opinion words. Fan et al. (2019) have annotated opinion words for target aspects on SemEval benchmarks. We notice that the provided datasets do not keep the original order and have some differences in texts. Thus, we first match the annotations to the original datasets, and then manually pick the reviews including opinion words towards the aspect from the remaining part. As results shown in Table 2 (as seen in Section 4), 27.47% and 30.09% of reviews are divide into ISE part among Restaurant and Laptop, revealing that implicit sentiment exists widely in ABSA and is worthy to be explored.

Methodology
In this section, we introduce the pre-training and fine-tuning scheme of our models. In pre-training, we introduce Supervised ContrAstive Pre-Training (SCAPT) for ABSA, which learns the polarity of sentiment expressions by leveraging retrieved review corpus. In fine-tuning, aspect-aware finetuning is adopted to enhance the ability of models on aspect-based sentiment identification.

Supervised Contrastive Pre-training
Three objectives are included in SCAPT: supervised contrastive learning, masked aspect prediction, and review reconstruction. The details of SCAPT's procedure are shown in Figure 1.
Transformer Encoder Backbone The pretraining scheme is built on Transformer encoder (Vaswani et al., 2017). We denote the retrieved review corpus used in SCAPT as D = {x 1 , x 2 , . . . , x n } including n sentences. The i-th sentence x i is labeled with y i . For each input sentence x i , following Devlin et al. (2019), we format the input sentence as I i = [CLS] + x i + [SEP] to feed into the model. The output vector of [CLS] token encodes the sentence representationh i :  Figure 1: An overview of SCAPT on ABSA. SCAPT consists of three objectives, in which Supervised Contrastive Learning aligns the representations with the same sentiment label.
Supervised Contrastive Learning Inspired by Khosla et al. (2020), we adopt supervised contrastive learning objective in SCAPT to align the representation of explicit and implicit sentiment expressions with the same emotion. Supervised contrastive learning encourages the model to capture the entailed sentiment orientation in context and incorporate it in sentiment representation.
Specifically, for (x i , y i ) within a batch B, we first extract sentiment representation s i = W shi from sentence representationh i of x i . W s could be seen as a trainable sentiment perceptron for sentences. The supervised contrastive loss on the batch B is defined as: Here, P sup B (i, c) indicates the likelihood that s c is most similar to s i and τ is the temperature of softmax. Here we simply use sim(s i , s c ) = s i · s c for similarity metric. Supervised contrastive loss L sup B is calculated for every sentence s i among B, where C i = |{c|y c = y i , c = i}| is the number of samples in the same category y i in B. Notably, we do not directly use sentence representation in the supervised contrastive pre-training process. Instead, we use sentiment representation to make full use of document-level labeled corpora in mining the inherent sentiment perception.
Review Reconstruction Motivated by the power of denoising auto-encoder (Vincent et al., 2008) and its success in pre-training models (Lewis et al., 2020), we further propose review reconstruction task to enhance the sentence representation on context semantic modeling. With solely pretrained on the supervised contrastive learning task which only focuses on sentiment regularization, the essential semantic information is not completely preserved in the sentence representations. Thus, we additionally employ review reconstruction in SCAPT to capture comprehensive context information in sentence representations.
Generally, this objective reconstructs the whole sentence x i with the sentence representationh i . After encoding x i to the sentence representationh i , the latter is fed to Transformer decoder for autoregressive generation: x i is the recovered sentence.h i acts as a beginningof-sentence input embedding in the decoding process to control the whole generation. We use the original sentence x i without masking as the gold reference of review reconstruction objective: Masked Aspect Prediction In masked aspect prediction, the model learns to predict the masked aspect from a corrupted version for each review. The masking strategy of input reviews consists of following two steps: 1. Aspect Span Masking. Since all inputs are from our retrieved corpora, we ensure that each review contains at least one aspect. For each input, the tokens of aspect spans are replaced with [MASK] with 80% probability, or replaced with a random token with 10% probability, otherwise kept unchanged. Aspect span masking provides a better capture of aspect words. 2. Random Masking. After aspect span masking, if the proportion of masked tokens is less than 15%, we randomly mask extra tokens from the rest ones to reach the proportion.  We denote the input token of [MASK] as w MASK . For each masked input token at k-th position, its contextualized hidden representation h ik is fed into a softmax layer to predict the original word: Specific to the above equation, h ik is the output of Transformer encoder at k-th position, W o is a trainable parameter matrix, and P map (k) indicates the predict probability of the original word at k-th position. The masked aspect prediction loss is an accumulation of log-likelihood on predictions of each masked position: Different from MLM (Devlin et al., 2019) or sentiment masking (Tian et al., 2020), masked aspect prediction focuses more on modeling aspectrelated context information in aspect-based representations, which complements the other pretraining objectives and purposefully benefits our fine-tuning scheme.
Joint Training The three losses mentioned above are combined and jointly trained in SCAPT. For the overall pre-training loss L pre B on batch B, the review reconstruction loss and masked aspect prediction loss are counted on each example b ∈ B, and α and β are coefficients to balance the objectives:

Aspect-Aware Fine-tuning
Our proposed models are fine-tuned on ABSA benchmarks by aspect-aware fine-tuning, to fully leverage their ability of sentiment identification. They also learn to capture aspect-related sentiment information during fine-tuning. Specifically, given a sentence x ab = {w 1 , . . . , w a , . . . w n } in ABSA dataset D ab , and w a is one of the aspects occurring in x ab . In fine-tuning, models predict aspect-level sentiment orientation y ab a according to aspect-based representationh ab a and sentiment representation s ab .
Aspect-based Representation The research (Ethayarajh, 2019) on pre-trained contextualized word representation has demonstrated that it can capture context information related to the word. Thus, in spite of using laborious methods to embed the aspect information, we extract aspect-based representationh ab a by collecting final hidden states that correspond to w a . In fine-tuning,h ab a would focus on aspect-related words in context, which we believe would enhance the perception of aspect-specific opinion words and bring the model with a good view of explicit sentiment. Specifically, let I a be the token index in aspect x a , we average the hidden state h i for all i ∈ I a to acquire aspect-based representation: Notably, when processing multiple aspects w a1 , w a2 , . . . in sentence x ab , we extract aspectbased representationh ab a1 ,h ab a2 , . . . in a single run, while previous methods embed aspect and encoder whole input for each aspect one-by-one.
Representation Combination For sentiment classification, aspect-based representation and sentiment representation are considered jointly to predict aspect-level sentiment polarity. In that case, fine-tuned model builds the perception of both word-occurrence-related explicit sentiment and semantic-related implicit sentiment. We use the same sentiment perceptron W s in pretraining to extract sentiment representation s ab from sentence representation. Then sentiment representation s ab and aspect-based representation h ab a are concatenated for predicting aspect-level sentiment polarity: y ab a is the prediction on aspect x a and W a is trainable parameter matrix. Lastly, our fine-tuning objective is cross-entropy loss for prediction task L ab = − x ab ∈D ab log y ab a .

Experimental Settings
ABSA Datasets Our experiments are mainly conducted on two benchmarks, Laptop and Restaurant review from SemEval 2014 task 4 (Pontiki et al., 2014). We use ESE and ISE slices of their test parts to evaluate model performance on explicit and implicit sentiment respectively. The process to build these slices is detailed in Section 2. Furthermore, we also use a more challenging dataset, Multi-Aspect Multi-Sentiment (MAMS) , which shares the same domain to SemEval2014 Restaurant. All these datasets involve three sentiment categories which are positive, neutral, and negative. The details of these ABSA datasets can be found in Table 2.

Retrieved External Corpora
We retrieve largescale sentiment-annotated corpora from documentlevel labeled data for pre-training. Specifically, we first extract five-stars-rated/one-star-rated reviews from the Yelp 2 and Amazon Review (He and McAuley, 2016) datasets, and label them as positive/negative. Such a procedure can mitigate the noise in the 5-way rated document-level sentiment language source. Then we preserve reviews within the topic of restaurant/laptop to make sure that pre-train corpora and ABSA datasets are in the same domain. Later, we split these documentlevel reviews into sentences and preserve sentences containing the same aspect term as those mentioned in ABSA training sets. The sentiment label of each sentence is determined by the label of its original review. After the retrieving process, we finally acquire about 1.56/0.51 million sentence-level reviews from Yelp/Amazon that are noisy-labeled as positive/negative. After manually checking a small portion of both corpora, we confirm that both implicit and explicit sentiment expressions are available. We pre-train our models on the retrieved corpus that shares the same domain with the downstream ABSA task.  Models with SCAPT We apply SCAPT to Transformer encoder and BERT, and these models are fine-tuned by aspect-aware fine-tuning. The models are so-called TransEncAsp+SCAPT and BERTAsp+SCAPT respectively. We use a 300dimensional randomly initialized Transformer encoder with 6 layers and 6 heads and BERTbase-uncased as the basis. The pre-training for Transformer encoder and BERT takes 80 and 8 epochs respectively. We adopt Adam (Kingma and Ba, 2015) with warm-up to optimize our models with learning rate 1e−3 for Transformer encoder and 5e−5 for BERT. The pre-trained models are fine-tuned by aspect-aware fine-tuning with 5e−5 learning rate. The hyper-parameters are set as α = β = 1 for combining objectives in SCAPT, and τ = 0.07 in supervised contrastive learning.
Baselines We compare the proposed models with baselines from different perspectives to comprehensively evaluate the performance of our approach:  (Rietzler et al., 2020), and R-GAT+BERT . For better analyze the effect of SCAPT and aspectaware fine-tuning, we further propose the following variants as baselines:  Table 3: Overall performance of different methods on Restaurant and Laptop. We rerun the code of baselines and report their accuracy on ESE and ISE slices of the two datasets. For the baselines of which the accuracy or F1-score is missing, we also report the accuracy and F1-score of our rerunning version, and these results are marked with *.
• TransEncAsp: Directly apply aspect-aware fine-tuning on randomly initialized Transformer encoder without pre-training.
• BERTAsp+CEPT: Merely replace the supervised contrastive learning loss with cross-entropy loss in SCAPT. Other settings are the same as BERTAsp+SCAPT.

Results and Analysis
This section mainly demonstrates the experiment results. Our model achieves state-of-the-art on three ABSA benchmarks, and we illustrate the representation alignment effect of supervised contrastive learning and the effectiveness of other parts from several perspectives. Moreover, we reveal that our model is capable to identify implicit sentiment, and attributes its effectiveness to supervised contrastive learning in SCAPT.

Main Results
The performance of baselines and our proposed models are shown in Table 3. Models are evaluated with Accuracy and Macro-F1. According to the results, several observations can be noted.
Our model achieves SOTA performance. BERTAsp+SCAPT outperforms the current SOTA model by 1.97%/3.80% on Restaurant/Laptop. TransEncAsp+SCAPT performs better than most baselines without pre-trained knowledge. Moreover, BERTAsp+SCAPT also achieves the best performance on ESE/ISE slices of the two datasets, revealing the effectiveness of the proposed pretraining scheme.
After pre-trained with SCAPT, models improve significantly on ABSA tasks.
Compared with BERTAsp which directly fine-tuned on ABSA datasets, BERTAsp+SCAPT achieves a 3.31%/4.23% performance gain on Restaurant/Laptop, which is a convinced proof that acquiring in-domain knowledge with proper adaptive pretraining is still necessary for knowledge-enhanced models, and SCAPT is an effective approach to be adopted. Moreover, TransEncAsp+SCAPT is 6.29%/11.34% better that TransEncAsp, illustrating that incorporating sentiment knowledge with SCAPT greatly potentiates ABSA models.
SCAPT is good at learning implicit sentiment. This could be verified from several perspectives. First, compared with its performance on ESE, BERTAsp+SCAPT appears to be much better on ISE. Compared with other  works, BERTAsp+SCAPT is around 0-2% better on ESE slices, but surpasses the previous SOTA model by 4.49%/4.60% on ISE slices. Therefore, the well performance of BERTAsp+SCAPT mainly contributes to its awareness of implicit sentiment. Second, TransEncAsp+SCAPT behaves much better than BERTAsp on ISE slices. With only exposing to million-scale pre-training corpus, TransEncAsp+SCAPT is generally worse than BERTAsp on the whole task, but exceeds BERTAsp by 4.88%/4.43% on ISE slices. This demonstrates that SCAPT is data-effective on learning implicit sentiment. Last, after pre-trained with SCAPT, models attain remarkable performance gain on ISE which is much more significant than ESE.
BERTAsp+SCAPT is 2% better than BERTAsp on ESE, but outperforms the latter by 8.61%/9.20% on ISE. As for Transformer encoder based models, the performance gain on ISE after SCAPT goes beyond 20%. We conclude that what models have learned in SCAPT is dominantly the perception of implicit sentiment.
Aspect-aware fine-tuning serves as a complement to SCAPT. We find that models with aspectaware fine-tuning perform better on ESE slices of the datasets. Specifically, BERTAsp performs worse on ISE but better on ESE compared with BERT-SPC, and is therefore evaluated to be better on the two datasets. The better performance of BERTAsp on ESE slices may mainly due to its use of aspect-based representation, which attends to aspect-related context that may contain sentiment orientation. This characteristic of aspectaware fine-tuning makes it suitable to enhance the recognition of explicit sentiment of models pretrained with SCAPT.   Table 4 shows the performance of baselines and our models in MAMS datasets. Though it is challenging to distinguish the sentiment polarities of multiple aspects in a single sentence, the result shows TransEncAsp+SCAPT outperforms baselines that lack external sentiment knowledge, and BERTAsp+SCAPT achieves state-of-the-art in the multi-aspect scenario. The efficiency of our models can attribute to both SCAPT and aspect-aware finetuning since they enhance the learning of implicit and explicit sentiment respectively. Besides, BERTAsp performs much more better than BERT-SPC in MAMS than in Restaurant/Laptop. We suppose the exceeding performance of BERTAsp credits to its modeling of contextual information in aspect-based representation, which is more important in multi-aspect ABSA.

Implicit Sentiment Learning in SCAPT
We conclude the key aspects of learning implicit sentiment in SCAPT as exposing to sentiment knowledge and using supervised contrastive learning. The results in Table 3 shows that implicit sentiment is more challenging to learn than explicit sentiment, and previous methods based on attention or syntax modeling are not tackling the issue perfectly. The knowledge-enhanced baselines perform slightly better with 5% performance gain on ISE. By pre-training on large-scale sentimentannotated corpora, our models achieve remarkable performance improvement on implicit sentiment learning, with 19.59%/29.62% relative gain on TransEncAsp. These results prove that in-domain sentiment knowledge is absolutely necessary for implicit sentiment learning, which is provided by our retrieved corpora. Furthermore, the models pre-trained with supervised contrastive learning objective surpasses cross-entropy classification in  ISE slices. Compared with BERTAsp+CEPT, BERTAsp+SCAPT is 4.49%/1.73% better on ISE, which leads to its better performance on the whole tasks. The deployment of supervised contrastive learning objective enhances noise immunity of the pre-training process, thus the pre-trained models are more effective in learning implicit sentiment.

Ablation Study on SCAPT
As illustrated in Table 5, we validate the effectiveness of each part by ablation study. First, removing supervised contrastive learning loss (-SCL) leads to a 2.38% performance drop on Restaurant, which is more significant than the occation of removing the other two objectives (-MAP-RR). This verifies that supervised contrastive learning plays a primary role in SCAPT. Besides, we observe that the removing of masked aspect prediction and review reconstruction objectives also brings about performance drop. This demonstrates that these mechanisms are also indispensable in SCAPT.

Hidden Sentiment Representations
For better understanding the behavior of our proposed methods, we further perform a visualization of the sentiment representation using t-SNE (Van der Maaten and Hinton, 2008). As seen in Figure 3, models with sentiment pre-training have a strong embedding ability for sentiment expression, while many misclassifications can be found in BERTAsp. The visualization also shows that BERTAsp+SCAPT tightly clusters the representations of both implicit and explicit sentiment expressions.

Aspect Robustness
We analyze the robustness of our proposed models on aspect robustness test sets. Aspect robustness of ABSA was first emphasized and tested in Xing et al. (2020) by applying several perturbations on reviews from Restaurant and Laptop. TextFlint (Wang et al., 2021) extended these transformations by introducing transformations from various linguistic perspectives. The test sets are designed to probe whether models could distinguish the sentiment of the target aspect from the non-target aspects and unrelated information. Table 6 lists the performance of tested models, in which the robustness of our proposed models is convincingly proved. Comparing to obvious performance drop in baseline models, BERTAsp+SCAPT performs significantly better than other models with 9.05%/6.63% decline on Restaurant and Laptop. The results show that models pre-trained with SCAPT are more robust for aspect-level perturbations, which attribute to the better modeling for sentiment and context information with the enhancement of in-domain sentiment knowledge.

Related Work
Neural Network Methods for ABSA The early neural network methods (Wang et al., 2016b;Ma et al., 2017) in ABSA employed various of attention mechanisms to identify aspect-related context. Memory Network (Tang et al., 2016;Chen et al., 2017;Wang et al., 2018) was further proposed to identify corresponding sentiment expression for aspects. Recent efforts (He et al., 2018;Tang et al., 2020) used syntax information from dependency trees to enhance attention-based models. A lot of works Sun et al., 2019;) make use of graph neural networks to incorporate tree-structured syntactic information and capture aspect-related information in text. Another line in ABSA concentrated on utilizing external corpus and pre-trained knowledge to enhance semantic awareness of models Rietzler et al., 2020;Dai et al., 2021).

Contrastive Representation Learning
Our work adopts contrastive method in representation learning to acquire discriminating instance representations. Recent work on contrastive representation learning of instances usually based on estimating representation similarities on similar and dissimilar pairs, which are usually composed in a self-supervised manner . Specially, Khosla et al. (2020) illustrated a supervised contrastive method to build positive pairs between instances with same class label, and put their representations together. In this work, our models learn to capture implicit sentiment from informative but noisy language resources in supervised contrastive pre-training.

Conclusion
In this paper, we introduce Supervised ContrAstive Pre-Training (SCAPT) for ABSA. By noticing that implicit sentiment is not well-handled by current neural network based ABSA models, we argue that more sentiment knowledge is required to solve this issue. We therefore retrieve large-scale indomain annotated corpora, and propose SCAPT to learn sentiment knowledge from the corpora. Experimental results show that our proposed models with SCAPT achieve SOTA performance. Moreover, SCAPT is proven to be effective in implicit sentiment learning. We hope to inspire future researches on learning and modeling implicit sentiment with knowledge-enhanced methods.