Deep Context- and Relation-Aware Learning for Aspect-based Sentiment Analysis

Existing works for aspect-based sentiment analysis (ABSA) have adopted a unified approach, which allows the interactive relations among subtasks. However, we observe that these methods tend to predict polarities based on the literal meaning of aspect and opinion terms and mainly consider relations implicitly among subtasks at the word level. In addition, identifying multiple aspect-opinion pairs with their polarities is much more challenging. Therefore, a comprehensive understanding of contextual information w.r.t. the aspect and opinion are further required in ABSA. In this paper, we propose Deep Contextualized Relation-Aware Network (DCRAN), which allows interactive relations among subtasks with deep contextual information based on two modules (i.e., Aspect and Opinion Propagation and Explicit Self-Supervised Strategies). Especially, we design novel self-supervised strategies for ABSA, which have strengths in dealing with multiple aspects. Experimental results show that DCRAN significantly outperforms previous state-of-the-art methods by large margins on three widely used benchmarks.


E2
The sushi (neg) is cut in blocks bigger than my cell phone.   (Chen and Qian, 2020) that we reimplement. All the results are based on BERT base model for a fair comparison. The polarity labels pos, neu, and neg, denote positive, neutral, and negative, respectively.
Existing works for ABSA have adopted a twostep approach, which considers each subtask separately (Tang et al., 2016;Xu et al., 2018). However, most recently, unified approaches have achieved significant performance improvements in ABSA task. Luo et al. (2020) focused on modeling the interactions between aspect terms and Chen and Qian (2020) exploited dyadic and triadic relations between subtasks (i.e., ATE, OTE, ASC).
Despite the impressive results, their methods have two limitations. First, they only consider relations among subtasks at the word level and do not explicitly utilize contextualized information of the whole sequence. For example, E1 in Table 1, the opinion term "better" seems to represent positive opinion of "Japanese food". However, the authentic meaning of E1 is "The Japanese food I have had at the food court was more delicious than the one I had at this restaurant". Thus, previous approaches tend to assign polarities based on the literal meaning of aspect and opinion terms (E2). Second, identifying multiple aspect-opinion pairs and their polarities is much more challenging as the model needs to not only detect multiple aspects and opinions but also correctly predict each polarity of the aspect (E3).
To address the aforementioned issues, we propose Deep Contextualized Relation-Aware Network (DCRAN) for ABSA. DCRAN not only implicitly allows interactive relations among the subtasks of ABSA, but also explicitly considers their relations by using contextual information. Our main contributions are as follows: 1) We design aspect and opinion propagation decoder so that the model has a comprehensive understanding of the whole context, and thus it results in better prediction of the polarity. 2) We propose novel selfsupervised strategies for ABSA, which are highly effective in dealing with multiple aspects and considering deep contextualized information with the aspect and opinion terms. To the best of our knowledge, it is the first attempt to design explicit selfsupervised methods for ABSA. 3) Experimental results demonstrate that DCRAN significantly outperforms previous state-of-the-art methods on three widely used benchmarks.

Task Definition
Given a sentence S = {w 1 , w 2 , ..., w n }, where n denotes the number of tokens, we aim to solve three subtasks: aspect term extraction (ATE), opinion term extraction (OTE), and aspect-based sentiment classification (ASC) as sequence labeling problems. ATE task aims to identify a sequence of aspect terms Y a = {y a 1 , y a 2 , ..., y a n }, where y a i ∈ {B, I, O}, and OTE task aims to identify a sequence of opinion terms Y o = {y o 1 , y o 2 , ..., y o n }, where y o i ∈ {B, I, O} of aspect and opinion terms, respectively. Likewise, ASC task aims to assign a sequence of polarities Y p = {y p 1 , y p 2 , ..., y p n }, where y p i ∈ {P OS, N EU, N EG, O}. The labels POS, NEU, and NEG denote positive, neutral, and negative, respectively.

Task-Shared Representation Learning
Following existing works, we utilize pre-trained language models, such as BERT (Devlin et al., 2019) and ELECTRA (Clark et al., 2020) as the shared encoder to construct context representation, which is shared by subtasks: ATE, OTE, and ASC. Given a sentence S = {w 1 , w 2 , ..., w n }, pretrained language models take the input sequence, X absa = [[CLS] w 1 w 2 ... w n [SEP]], and output a se-quence of the shared context representation, H = {h [CLS] , h 1 , h 2 , ..., h n , h [SEP] } ∈ R d h ×(n+2) , where d h represents a dimension of the shared encoder. We represent the parameters of the shared encoder as Θ s . Then, we utilize a single-layer feed-forward neural network (FFNN) as, where W 1 ∈ R d h ×d h and W 2 ∈ R 3×d h are trainable parameters. The parameters of a single-layer FFNN are represented as Θ a for aspect term extraction. The objective of aspect term extraction is minimizing the negative log-likelihood (NLL) loss:

Aspect and Opinion Propagation
We utilize the transformer-decoder (Vaswani et al., 2017) to consider relations of aspect and opinion while predicting a sequence of polarities. Our transformer-decoder is mainly composed of a multihead self-attention, two multi-head cross attention, and a feed-forward layer. The multi-head selfattention takes shared context representation H as, and U h , Z a , and Z o are fed into two steps of cross multi-head attention as, where LN represents layer norm (Ba et al., 2016). Note that Equation 3 and 4 represent aspect and opinion propagation, respectively. Then U o is fed into a single-layer FFNN to obtain a sequence of polarities Y p . The objective of aspect-based sentiment analysis is minimizing the NLL loss: The architecture of the aspect and opinion propagation is described in Figure 1-(a).

Explicit Self-Supervised Strategies
To further exploit the aspect-opinion relation with contextualized information of a sentence, we propose explicit self-supervised strategies consisting of two auxiliary tasks: 1) type-specific masked term

Type-Specific Masked Term Discrimination
In the type-specific masked term discrimination task, we uniformly mask aspects, opinions, and terms that do not correspond to both, using the special token [MASK]. The input sequence of a masked sentence is represented as, , and is fed into pre-trained language models. Then, the output representation of [CLS] token is used to classify which type of term is masked in a sentence as, where W 3 ∈ R 3×d h represents trainable parameters andŶ m ∈ {Aspect, Opinion, O}. The parameters of a linear projection layer are represented as Θ m for the type-specific masked term discrimination. Then, the NLL loss of the typespecific masked term discrimination is defined as: . This allows the model to explicitly exploit sentence information by discriminating what kind of term is masked.
Pairwise Relations Discrimination In this task, we uniformly replace both aspects and opinion terms using the special token [REL]. The input sequence of a masked sentence is represented as, and is fed into pre-trained language models. Then, the output representation of [CLS] token is used to discriminate whether the replaced tokens have a pairwise relation as, where W 4 ∈ R 2×d h represents trainable parameters andŶ r ∈ {T rue, F alse}. The parameters of a linear projection layer are represented as Θ r for the pairwise relations discrimination. Then, the NLL loss of the pairwise relations discrimination is defined as: We describe the negative sampling method to replace aspects and opinion terms in Appendix A.2.

Joint Learning Procedure
All these tasks are jointly trained, and the final objective is defined as, where α is a hyper-parameter determining the degree of auxiliary tasks. Note that the parameters Θ s are optimized for all subtasks. Especially, the parameters Θ s are further optimized through L tsmtd and L prd to explicitly exploit the relations between aspect and opinion with context meaning.

Experimental Setup
We evaluate our model on three widely used sentiment analysis benchmarks: laptop reviews (Wang et al., 2018) GloVe  Table 2: Evaluation results on the LAP14, REST14, and REST15 datasets, which are provided by Chen and Qian (2020). All the results except ours are cited from the existing works (Chen and Qian, 2020;Peng et al., 2020;Mao et al., 2021) and all the baselines are described in Appendix A.4. We report average results over five runs with random initialization. The best scores are in bold, and the second-best scores are underlined depending on the types of the pre-trained language model. '-' denotes unreported results.

Ablation Study
To study the effectiveness of the aspect propagation (AP), opinion propagation (OP), type-specific masked term discrimination (TSMTD), and pairwise relations discrimination (PRD), we conduct ablation experiments on the REST14 dataset. We set the baseline model that did not utilize aspect and opinion propagation and explicit self-supervised strategies. When the AP and OP are not utilized, a single-layer FFNN is utilized as in Equation 1 to predict a sequence of polarities Y p instead of transformer-decoder. As shown in Table 3, we can observe that the AP is more effective than the OP, and scores drop significantly when not utilizing the AP and OP. In the case of explicit self-supervised strategies, we can observe that the PRD is more effective than the TSMTD. As the PRD objective is discriminating whether the replace tokens have a pairwise aspect-opinion relations, it allows the model to more exploit the relations between aspect and opinion at a sentence level.  Table 4: Aspect analysis on the REST14 and REST15 datasets. Comparisons of ABSA-F1 and sentence-level accuracy results for the case when the sentence contains single-aspect or multiple-aspect.

Aspect Analysis
We conduct aspect analysis by comparing sentences with single-and multiple-aspect. As shown in Table 4, Aspect and Opinion Propagation significantly improves performance when the sentence contains a single-aspect, while a small increase is observed w.r.t. the case of multiple-aspect. Although considering the relations between aspect and opinion implicitly can improve performance w.r.t. the case of single-aspect, it is not sufficient for inducing performance improvement for the multiple-aspect case. It suggests that additional explicit tasks are further required to identify multipleaspect with corresponding opinions, which helps the model assign polarities correctly. In the case of multiple-aspect, Explicit Self-Supervised Strategies show absolute ABSA-F1 improvements of 0.97% (80.22% → 81.19%) and 3.04% (65.16% → 68.20) on the REST14 and REST15 datasets, respectively. This indicates explicit self-supervised strategies are highly effective for correctly identifying ABSA when the sentence contains multipleaspect. In addition, the performance gain by Explicit Self-Supervised Strategies in Table 3 is mostly derived from the multiple-aspect cases (+0.97%), thus our proposed model has strengths in dealing with multiple aspects.
In ABSA, it is important to accurately predict all aspects and corresponding sentiment polarities in a sentence. Since ABSA-F1 is a word-level based metric, it still has a limitation to evaluate whether all aspects and corresponding polarities are correct or not. Therefore, we also evaluate our method with sentence-level accuracy; the number of sentences that accurately predicted all aspects and polarity in a sentence divided by total number of sentences. Unlike ABSA-F1, the sentence-level accuracy of multiple-aspect is lower than that of single-aspect, which implies identifying multiple aspects and their polarities is more challenging. In the case of multiple-aspect, our Explicit Self-Supervised Strategies leads significant sentencelevel accuracy improvements of 2.54% (61.70% → 64.24%) and 3.74% (48.60% → 52.34%) on the REST14 and REST15 datasets, respectively. However, we observe only small improvements in sentence-level accuracy on both datasets when the sentence contains single-aspect. From these observations, we demonstrate that our proposed method is highly effective for the case when the sentence contains multiple aspects.

Conclusion
In this paper, we proposed the Deep Contextualized Relation-Aware Network (DCRAN) for aspectbased sentiment analysis. DCRAN allows interaction between subtasks implicitly in a more effective manner and two explicit self-supervised strategies for deep context-and relation-aware learning. We obtained the new state-of-the-art results on three widely used benchmarks.
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 547-556.
Zhuang Chen and Tieyun Qian. 2020. Relation-aware collaborative learning for unified aspect-based sentiment analysis. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3685-3694.
Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators. In International Conference on Learning Representations.  RACL (Chen and Qian, 2020) defines interactive relations among ATE, OTE, and ASC. It proposes relation propagation mechanisms through the stacked multi-layer network.
Dual-MRC (Mao et al., 2021) leverages two machine reading comprehension problems to solve ATE and ASC. It jointly trains two BERT-MRC models sharing parameters.

A.5 Case Study
In E1 and E3, while all models correctly extract both aspect and opinion, RACL and DCRAN w/o make inaccurate polarities predictions based on the words having superficial meaning (i.e., well prepared, disgusted). Especially, E3 expresses a sarcastic opinion about aspect terms throughout the sentence. It suggests that these models cannot understand the authentic meaning of the sentence. On the other hand, DCRAN grasps the entire context and predicts the correct polarity corresponding to its aspect. In E2, the evidence for understanding the actual meaning of the aspect term staff is not specified in a word-level opinion and expressed in a sentence like "I hope the staff pays more attention to the little details in the future". In this case, RACL can not extract aspect and opinion terms, and DCRAN w/o make inaccurate polarities predictions for the aspect term staff based on the opinion term beautiful. However, DCRAN with Explicit Self-Supervised Strategies understands the sentence expressing an opinion on the staff and predicts correctly.