Aspect-Category Enhanced Learning with a Neural Coherence Model for Implicit Sentiment Analysis

,


Introduction
Aspect-based sentiment analysis (ABSA) has been one of the major research topics in NLP since a large volume of reviews has been accessible via social networking services.Much of the previous work on ABSA has focused on explicit sentiment (Yang and Zhao, 2022;Yan et al., 2021;Chen et al., 2022), while implicit sentiment, in which obvious sentiment polarity words do not appear in the sentence but convey sentiments, often appears in reviews.As illustrated in Figure 1, "The food here is rather good, but only if you like to wait for it."in s 1 , cannot be clearly identified as "negative" with respect to the aspect category "service," because it does not include any opinion words related to it.Even less work on implicit sentiment mainly leverages intrasentential information to exploit effective contexts, for example, syntactic information Figure 1: A review from SemEval-2016: Words in angle brackets refer to aspect-categories.The blue lines indicate that the review is coherent.The orange and green lines show that each "service" and "food" category points out its sentiment.from dependency trees which results in insufficient representations capturing their contexts.
One feasible solution is to leverage one aspect of document quality: a document forms coherence if it is well-written and easy to understand.Many coherence models such as entity-based approaches (Jeon and Strube, 2022), and neural coherence models (Nguyen and Joty, 2017;Moon et al., 2019) have been proposed and applied to various NLP tasks.However, it is often the case that each sentence in a review includes several different aspect-categories, and even the same aspectcategory within a review expresses different sentiment polarities.As illustrated in bold lines in Figure 1, s 1 has two different aspect-categories, "food" and "service," with different polarities.Likewise, "food" in s 1 is positive, while that of s 5 shows negative polarity.Much of the existing coherence models obtained from the distribution of entities or adjacent sentences are inadequate and make it hard to predict an accurate polarity of implicit sentiment.
Motivated by the issue mentioned above, we propose an aspect-category enhanced learning with a neural coherence model (ELCoM) that leverages coherence information for ABSA, especially ben-eficial for implicit sentiment classification.On the one hand, observing from the SemEval-15 and SemEval-16 ABSA datasets, (i) more than 50% of the reviews had the same sentiment polarity regardless of different aspect-categories, and (ii) more than 70% of reviews had the same sentiment polarity if they had the same aspect-category.These observations indicate that (i) the reviewer's opinions are likely to be preserved throughout the review and (ii) the aspect-category is a strong clue for capturing sentence-level coherence to aid implicit sentiment classification.We thus utilize documentlevel coherence to deal with (i) and exploit the relationships among sentences and aspect-categories by utilizing a hypergraph for (ii).
On the other hand, the rest of the 20∼30% reviews include opposite polarities with each other even in the same category, while these reviews preserve coherence.To compensate for the dilemma of the side effect, we perform cross-category enhancements in the hypergraph.More specifically, we utilize a self-attention (SA) filtering to offset the impact of anomalous nodes, apply retrieval-based attention (Rba) technique (Zhang et al., 2019) to learn the enhanced embedding of the aspect term, and finally obtain sentence representations with enhanced aspect-category.
The main contributions of our work can be summarized as follows: (1) we propose an ELCoM that learns document-level coherence by using contrastive learning and sentence-level by hypergraph for mining opinions to aid implicit sentiment classification; (2) we propose cross-category enhancements on node embedding to offset the impact of anomalous nodes to correctly identify the sentiment of the same categories but have different polarities; and (3) extensive experiments on SemEval-2015 and 2016 show that our method achieves state-ofthe-art performance.

Implicit Sentiment Analysis
To date, there has been very little work on implicit sentiment analysis.To address this issue, Li et al. (2021a) proposed a supervised contrastive pre-training model that learns sentiment clues from large-scale noisy sentiment-annotated corpora.Wang et al. (2022) established the causal representation of the implicit sentiment by identifying the causal effect between the sentence and sentiment.These attempts achieved better perfor-mance, while their models treat a sentence independently and ignore how sentences are connected as well as how the entire document is organized to convey information to the reader.
In the context of leveraging a whole document for sentiment analysis, Chen et al. (2020) assumed intra-and inter-aspect sentiment preferences to classify aspect sentiment.Their approach is similar to ours since they focused on aspect-categories to capture sentiment polarity tendencies.However, their model only explored explicit sentiment classification.Cai et al. (2020) attempted implicit aspectcategory detection and category-oriented sentiment classification by applying a hierarchical graph convolutional network.Their approach is also similar to ours in that they consider document-level information.The difference is that we leverage contextual features by utilizing both documentand sentence-level coherence based on aspectcategories to learn more fine-grained contextual representation.

Coherence Analysis
With the success of deep learning (DL) techniques, many authors have attempted to apply DL to learn features for coherence.One attempt is to model coherence as the relationship between adjacent sentences.This type includes the CNN (Nguyen and Joty, 2017), a hierarchical RNN (Nadeem and Ostendorf, 2018), and an attention mechanism (Liao et al., 2021).Another attempt is coherence as a whole document.This includes inter-sentence discourse relations (Moon et al., 2019) as well as word-and document-level coherence (Farag and Yannakoudakis, 2019).Our approach lies across both attempts and provides a comprehension framework for sentiment coherence.
Most of the attempts to apply neural coherence models to NLP tasks, such as text generation (Parveen et al., 2016;Guan et al., 2021), summarization (Eva et al., 2019;Goyal et al., 2022), and text quality assessment (Farag et al., 2018;Mesgar and Strube, 2018), emphasize how to capture the global level of coherence in the text.Yang and Li (2021) proposed to exploit coherency sentiment representations to help implicit sentiment analysis.They focused on the local (word-level) aspect of sentiment coherency within the target sentence.Our ELCoM captures document-level coherence by using contrastive learning and sentence-level by hypergraph, and in this way, our model is sen-sitive to capturing both global and local patterns, even in a small size of the training data such as the SemEval sentiment dataset.

Task Definition
Let D = {s i } I i=1 be an input review consisting of the number of I sentences, where the i-th sentence s i = {w j } n j=1 consists of the number of n words.Let a s i = {a t s i } T t=1 also be a set consisting of T aspects within s i .Each a t s i consists of a pair of the aspect-category and term.Let c = {positive, negative, neutral} be a label set of sentiment polarities.The goal of the ABSA task is, for the given t-th aspect in s i , to predict the sentiment label l(a t s i ).

Approach
The overall architecture of the ELCoM is illustrated in Figure 2. It comprises three key steps: (1) representation learning with XLNet, (2) coherence modeling (CoM) to capture document-level and sentence-level coherence, and (3) cross-category enhancement to mitigate the influence of anomalous nodes.

Representation Learning with XLNet
We utilized XLNet (Yang et al., 2019) as the backbone model to obtain the sentence representation related to the target aspect and to review document representation.The XLNet is known to improve performance, especially for tasks involving a longer text sequence, e.g.text summarization, text classification, and text quality assessment.It is also utilized to model the coherence representation (Jwalapuram et al., 2022), because it takes advantage of both the autoregressive model and the BERT (Devlin et al., 2019).Formally, for a sequence x of the length I, there are I! possible orders for autoregressive factorization.Let Z I be the set of total permutations of the length I index sequence [1,2,• • • , I]. z τ and z <τ denote the τ -th element and the first τ -1 elements of a permutation z ∈ Z I .The objective of XLNet is given by: As such, the permutation language modeling optimization objective enables XLNet to effectively model the coherence across the document-level review.
Specifically, we create an input sentence, for each aspect a t s i that appeared in the target sentence s i = {w j } n j=1 .Here, the input is padded with two special symbols, [SEP] and [CLS], which are the same as those of BERT.We apply XLNet to the input and obtain each word embedding e w j ∈ R dm and the aspect-based sentence embedding e s ∈ R dm marked with [CLS], where d m is the dimension size.Likewise, given the input document D = {s i } I i=1 , we concatenate it and create an input document sequence, For the input sequence, we apply XLNet and obtain the document embedding e d ∈ R I×dm marked with [CLS] that contains both document-and sentence-level representations.

Coherence Modeling
Document-Level Coherence with Contrastive Learning.Following Jwalapuram et al. (2022), to learn robust coherence representations, we adopt the sentence ordering task by using contrastive learning.It enforces that the coherence score of the positive sample (original document) should be higher than that of the negative sample (disorder document).Therefore, we disorder the original review to generate the number of B negative samples by randomly shuffling the sentences within the document.For the results, we applied contrastive learning to align the coherent and incoherent representations.Let f θ (e d ) be a linear projection to convert coherent document embedding e d into coherence scores.The margin-based contrastive loss is given by: Sentence-Level Coherence by Hypergraph.Recall that more than 70% reviews have the same sentiment polarity if they have the same aspectcategory.This indicates that the aspect-category is beneficial for sentiment identification.We thus utilize hypergraphs to exploit the relationships among sentences and aspect-categories.The hypergraph is a variant of the graph, where a hyperedge (edge in hypergraph) connects any number of vertices, while in graph-based methods, e.g., graph convolutional networks (GCN), an edge connects only two vertices (Yu and Qin, 2019;Wang et al., 2023).We define aspect-categories (C in total) as hyperedges, and sentences as nodes in the hypergraph H ∈ R I×C .Each row in H is for a sentence and each column is for the hyperedge of an aspectcategory.H ij = 1 if vertex i is connected by hyperedge j, and H ij = 0 otherwise.Note that the multiplication of the document embedding e d and the set of multi-hot vectors (HH ⊤ ∈ R I×I ) can be regarded as the slicing operation that selects the corresponding embeddings of the sentences in the same category.The representation of the sentences toward the same aspect, e h ∈ R I×dm is obtained by: e h = HH ⊤ e d . (3)

Cross-Category Enhancement
We observed that 20∼30% reviews contain different polarities even in the same category, while these reviews preserve coherence.The document-and sentence-level coherence modeling often leads to error propagation as it learns the information on both polarities from other sentences during training.
To alleviate this issue, we perform cross-category enhancement on node embeddings in the hypergraph which is illustrated in Figure 2.More precisely, ( 1) we utilize a self-attention (SA) and reduce the influences between anomalous nodes, and (2) we apply a retrieval-based attention technique, Rba, to obtain enhanced embedding of the aspect term.
Self-Attention (SA) Filtering.Assume that if the aspect-category of the target query sentence contains different sentiments from the same aspect as other sentences, the SA weight should be small to dilute its features.We use the SA mechanism of the transformer, which is given by: where Q, K, and V refer to the query, key, and value matrices obtained by linear transformations of e h , respectively.The result is fed into a feedforward network, combined with layer normalization and residual connection.Each encoder layer takes the output of the previous layer as the input.This allows attention to be paid to all positions of the previous layer.The results are passed to an average pooling, and we obtain the filtered sentence representation êh ∈ R dm .Syntactic Representation Learning.We use the Stanford parser1 to obtain a dependency tree of the input sentence and apply graph convolution operations to learn the high-order correlations of the words.Given the syntactic adjacency matrix A ∈ R n×n of the sentence and the embedding set of nodes (words) e (0) = [e w,1 , e w,1 , ..., e w,n ], the node representations are updated as follows: where e (l−1) j ∈ R dm denotes the j-th word representation obtained from the GCN layer, e (l) i refers to the i-th word representation of the current GCN layer, and d i = n j=1 A ij is the degree of the i-th token in the dependency tree.The weights W (l)  and bias b (l) are trainable parameters.
We apply Rba to the output of the GCN.Specifically, we mask out the non-aspect words in the output of the GCN to obtain the masked representation e m .The attention weights α j of word w j are given by: where the position of the target word ranges at denotes the position of non-target words.β j calculates the semantic relatedness between the aspect and words other than the aspect in the sentence.The enhanced embedding of the aspect term is formulated by e r = n j=1 α j e s j .
where W p ∈ R |c|×dm and b p are the weight, and bias term, respectively.The task is trained with the cross-entropy loss, denoted as follows: where γ ∈ R |c| denotes the true label vector.
Recall that we utilize margin-based contrastive loss L cl to train the sentence ordering task.The final loss is given by: where ϕ (sh) indicates the shared parameters, ϕ 1 and ϕ 2 stand for parameters estimated in the ABSA task and the sentence ordering task, respectively.δ 1 , δ 2 ∈ [0, 1] are hyperparameters used to balance the weights of the two tasks.

Data and Evaluation Metrics
We conducted the experiments on four benchmark datasets: REST15 and LAP15 from the SemEval-2015 task12 (Pontiki et al., 2015), and REST16 and LAP16 from the SemEval-2016 task5 (Pontiki et al., 2016).The dataset consists of restaurant, and laptop domains, and positive, neutral, and negative sentiment polarities.SemEval-2016 is labeled only with explicit sentiments.We thus manually annotated implicit sentiment labels in the dataset.The data statistics are shown in Table 1.
We used accuracy ACC (%), and macroaveraged F1 (%) scores as metrics.We evaluated our model by using implicit, and explicit sentiment accuracy, IAC (%), and EAC (%), respectively.For a fair evaluation, we conducted each experiment five times and reported the average results.

Implementation Details
Following Chen et al. (2020) and Cai et al. (2020), we randomly chose 10% of the training data and used it as the development data.The optimal hyperparameters are as follows: The initial learning rate for coherence modeling was 6e-6 and others were 2e-5.The weight decay was set at 1e-3, and the dropout rate was 0.1.The number of negative samples B was 5, and the margin τ was 0.1.The balance coefficients δ 1 and δ 2 were set at 0.9 and 0.1, respectively.The number of graph convolutional layers was 2. All hyperparameters were tuned using Optuna2 .The search ranges are reported in Appendix A.3.We used AdamW (Loshchilov and Hutter, 2017) as the optimizer.

Baselines
We compared our approach with the following baselines: 6 Results and Discussion

Performance Comparison
Table 2 shows the results.Overall, the ELCoM attained an improvement over the second-best methods by a 0.63∼2.31%ACC and 0.57∼1.08%F1score, except for the F1-score on the REST16 dataset.In particular, it achieved remarkable results in implicit sentiment polarity classification, as the ELCoM achieved an improvement of IAC over the second-best method by 2.62∼14.81%,while that of EAC was 0.24∼2.36%.This reveals that leveraging document-and sentence-level coherence and reducing the influence of anomalous sentences significantly benefit sentiment analysis.Table 2 also provides the following observations and insights: • Most baselines suffer from implicit sentiment analysis, while the ELCoM breaks the bottleneck and maintains good performance.• SCAPT, which exploits pre-training on a large sentiment corpus and regards implicit and explicit sentiments as contrastive pairs, is competitive among baselines, and especially effective for the laptop domain.This indicates that prior knowledge is beneficial for sentiment analysis.• GNN-based methods such as MGFN and SSEGCN achieved inspired results, suggesting that word-level syntactic representation can enrich sentiment features while ignoring that the reviewer's opinions are likely to be preserved throughout the review.• The ELCoM and CoGAN exploit documentlevel sentiment knowledge, suggesting that capturing the reviewer's consistent sentiment expressions contributes to improving performance.

Ablation Study
We conducted an ablation study to examine the effects of each component of the ELCoM.The results in Table 2   REST15 by F1, indicates that document-level coherence learned from the sentence ordering task contributes to improving performance.• The ELCoM without cross-category enhancement (w/o CCE) suffers from a severe performance drop, particularly on LAP16 by F1, indicating that the SA mechanism that we used to reduce the influences between anomalous nodes is effective for accurate sentiment analysis.
Figure 4: Case study on REST16 data with their polarities predicted by BERT-SPC, B-SCAPT, CLEAN, and our approach: ✔ (or ✘) denotes that the predicted sentiment polarity is correct (or incorrect).Recall that our cross-category enhancement utilizes two attention mechanisms, SA filtering, and Rba, to offset the impact of anomalous nodes.To examine the effectiveness of each mechanism, we performed experiments, which are shown in Table 3. Overall, the enhancement improved the performance, for instance, 1.36∼2.93%by ACC and 4.14∼17.72%by F1 in all datasets.Specifically, SA filtering yields more benefits for REST15 and LAP15, while Rba works well for other datasets.
It is interesting to note how cross-category enhancement dealing with anomalous nodes affects performance.Table 4 shows the results against SSEGCN and B-SCAPT by focusing on reviews containing opposite polarities in the same category.We can see that with CCE, the model improves 1.27∼5.76%by ACC and 4.86∼8.94%by F1.This clearly supports the effectiveness of our cross-category enhancement.To better understand the ablation study, we visualized the distribution of sentence representations of each module by t-SNE (Van der Maaten and Hinton, 2008), which is illustrated in Figure 3.We can see that (1) XL-Net without CoM and cross-category enhancement roughly identifies the sentiment.a 2 is close to a 9 , a 10 , and a 11 , although they have opposite polarities.( 2) XLNet with CoM shows that the sentences Figure 5: Illustration of over-capturing contexts.Words such as "great," and "good" overly affect the negative opinion word "thin."that mention the same aspect are grouped together and share sentiments.(3) XLNet with CoM and cross-category enhancement shows that the dots in the same aspect-category are better clustered, while different aspect-categories are dispersed.

Efficacy of Coherence-Based Contexts
We compared the coherence-based contexts with the SA-and LSTM network-based ones to verify the effectiveness of the ELCoM, as these techniques are well known for effectively learning context dependencies.The results are shown in Table 5.It is reasonable that SA works better than LSTM, as the latter learns long-term dependencies.However, it reveals that SA may overcapture the context of irrelevant words even when the sentiment of the target aspect is explicit.As shown in Figure 5, s 2 contains two aspects: "food quality" and "food style."The SA-based context incorrectly predicts the sentiment polarity toward the latter aspect, as it overly captures positive sentiment from the words, "great," "good," and "fresh" as the descriptors of this aspect regardless of their dramatically effec-tive transition, which is in fact not the case.In contrast, our approach captures not only the original sentiment in the sentence but also the context of sentiments.From the experimental results, we can conclude that coherence-based contexts are currently the optimal alternative compared to SA and LSTM in sentiment analysis tasks.

Case Study
We highlighted the typical and difficult examples and compared the ELCoM with the baselines.We chose BERT-SPC, B-SCAPT, and CLEAN as baselines, since BERT-SPC is often used as a benchmark model, and others are focused on implicit sentiment analysis.Figure 4 illustrates the results.Case 1. BERT-SPC and B-SCAPT failed implicit sentiment in s 4 , as they could not correctly identify the sentiment of "service", which appears in s 1 .Likewise, s 5 contains two aspects in the same category of "food", and because of its complex syntactic structure, it results in BERT-SPC incorrectly classifying the "salsa (food)" aspect as positive.To correctly identify these aspects, CLEAN infers causal representations of implicit sentiments in s 1 and s 5 , and the ELCoM learns coherent contexts.Case 2. The sentiment of s 5 shows an implicit negative polarity in terms of the "restaurant".The ELCoM captures the polarity of the sentiment in s 4 via sentence-level coherence by hypergraph toward the same aspect-category to assist in sentiment classification.Case 3. It is extremely difficult to analyze the sentiment within short sentences, such as the "No comparison" in case 3.In contrast to the baselines, the ELCoM can capture the context of sentiment from s 2 and s 3 as auxiliary information.

Error Analysis on Explicit and Implicit Sentiments
We conducted error analyses on four datasets and found that the implicit sentence accuracy by EL-CoM is better than that of explicit sentences in some cases.There are three possible reasons: • One reason is the effectiveness of the coherence modeling (CoM).neutral sentiments are always too ambiguous to identify, and neutral samples are not enough in training sets.( 2) Many sentences with mixed (positive/negative) sentiment polarities were incorrectly identified, caused by negation words and unspecified referents within the sentences.• The majority of implicit sentences do not have both positive and negative sentiment polarities, as a user often exerts an objective fact to express an implicit opinion, which means less probability of containing more than one different sentiment.In contrast, much more explicit sentences have mixed sentiment polarities.In the example from the REST16 test dataset which is shown in Figure 6, we can see that only one sentiment polarity, negative appears in the implicit sentence, while the sentence with explicit sentiment includes both positive and negative sentiment.

Conclusion
We proposed aspect-category enhanced learning with a neural coherence model (ELCoM) for implicit sentiment analysis.To mine opinions from explicit sentences to aid implicit sentiment classification, ELCoM captures document-level coherence by using contrastive learning, and sentence-level by a hypergraph.To further offset the impact of anomalous nodes in hyperedges, we proposed a cross-category enhancement on node embeddings.
Extensive experiments have shown that the EL-CoM achieves competitive performance against state-of-the-art sentiment analysis methods.Future work includes, (i) improving the ELCoM by introducing a pre-training large sentiment corpus, and (ii) extending the ELCoM to simultaneously detect aspect-categories and their polarities (Cai et al., 2020).

A Appendix
A.1 Consistency of sentiment polarity.
Figure 7 shows the ratio of training and test reviews having consistent sentiment.We can see that 50.3%∼57.4% of the reviews had the same sentiment polarity regardless of different aspect categories over the review, and 70.8%∼84.8% of the reviews have the same sentiment polarity if they have the same aspect category.8 shows sentence-level coherence.The sentiment of s 1 and s 5 toward the a 1 is enhanced by each other and avoids ineffective propagation from the sentiment in irrelevant aspectcategories, such as s 4 or the sentiment of s 1 related to the aspect-category a 3 .In contrast, (c) of Figure 8 illustrates cross-category enhancement to compensate for the dilemma of side effects from the sentences that contain different polarities even in the same category.For example, in Figure 8 (c), the sentiment of s 5 related to aspect-category a 2 is prone to be incorrectly classified as positive, while the impact of s 2 and s 3 can be offset by crosscategory enhancement.

A.4 Example of input data for multi-task learning
As shown in Table 7, the input data of the multitask learning is as follows: the input of the sentence ordering task consists of the original review and its disordered review, and the input of the ABSA task comprises "Text" and "Opinions".Following Li et al. (2021b), we labeled "Text" as implicit if it does not contain any obvious opinion words for a certain aspect.The number of B disordered "Text" are generated from the original reviews by randomly shuffling the sentences, which is illustrated in Table 7.
Original Review [Review rid="1726473"] Text s 1 : Average to good Thai food, but terrible delivery.
Opinions target="null" category="service#general" polarity="negative" implicit_sentiment="False" Text s 3 : They were very abrupt with me when I called and actually claimed the food was late because they were out of rice.

Disordered Review
where f θ (e d + ) indicates the coherence score of the positive sample, f θ (e d − 1 ), ..., f θ (e d − B ) denote the scores of B negative samples, and τ is the margin.

Figure 2 :
Figure 2: The architecture of the ELCoM.It comprises representation learning with XLNet, coherence modeling, and cross-category enhancement, with the input data consisting of original reviews and the target ABSA sentences.The final output is obtained through sentiment polarity classification.

Figure 3 :
Figure 3: Visualization of the sentence representations.Each color corresponds to each aspect-category, and bold font underlined words refer to the aspect term.The dotted line separates positive and negative polarities.

Figure 6 :
Figure 6: Examples of explicit and implicit sentiments.Underlined words indicate aspect terms.

Figure 7 :A. 2
Figure 7: The ratio of reviews having consistent sentiment in different datasets.Consistency Over Review (COR): the ratio of reviews in which sentiment polarities are consistent.Consistency Over Category (COC): sentiment polarities in the same aspect category are consistent.

Figure 8 :
Figure 8: Illustration of a document-and sentence-level coherence, and cross-category enhancement.
Text s 4 : A Thai restaurant out of rice during dinner?Text s 1 : Average to good Thai food, but terrible delivery.Text s 5 : The food arrived 20 minutes after I called, cold and soggy.Text s 3 : They were very abrupt with me when I called and actually claimed the food was late because they were out of rice.Text s 2 : I've waited over one hour for food.

Table 1 :
Statistics of the ABSA dataset.COR and COC indicate the ratio of reviews in which sentiment polarities are consistent, and in which sentiment polarities in the same aspect-category are consistent, respectively.We use the multi-task learning (MTL) framework to optimize both the ABSA task and the sentence ordering task.As for the ABSA task, sentence representations with an enhanced aspect-category are obtained by e = [e s , êh , e r ].The result is passed on to the linear transformation layer.Using the softmax function, we obtain a probability score p ∈ R |c| :

Table 2 :
86.31 AAGCN 86.79 68.22 84.84 86.96 85.65 72.43 100.0 85.64 92.02 77.51 87.80 92.13 85.90 71.58 69.70 86.15 Sentic-GCN 85.89 70.67 84.54 86.09 85.82 72.89 100.0 85.87 91.23 79.31 85.36 91.37 85.27 71.61 69.70 85.78 Main results for four datasets.IAC and EAC refer to the accuracy of implicit and explicit sentences, respectively."w/o CoM" refers to the result without any coherence information.# indicates the results from the original papers.Note that all the IACs of LAP15 are 100.0.The reason for this is that there are only four implicit sentiment sentences, and all baselines are identified correctly.

Table 3 :
Performance on cross-category enhancement."Saf" indicates SA filtering, and "Rba" indicates retrieval-based attention.✔ and ✘ denote with/without each module, respectively.

Table 4 :
Results on reviews that contain different polarities in the same category.

Table 5 :
Comparison against several network models.

Table 6 :
Search range of each hyperparameter: LR refers to the learning rate.LR of CoM indicates the learning rate of coherence modeling.LR of others shows aspect-based sentiment analysis and crosscategory enhancement.