Powering Comparative Classification with Sentiment Analysis via Domain Adaptive Knowledge Transfer

We study Comparative Preference Classification (CPC) which aims at predicting whether a preference comparison exists between two entities in a given sentence and, if so, which entity is preferred over the other. High-quality CPC models can significantly benefit applications such as comparative question answering and review-based recommendation. Among the existing approaches, non-deep learning methods suffer from inferior performances. The state-of-the-art graph neural network-based ED-GAT (Ma et al., 2020) only considers syntactic information while ignoring the critical semantic relations and the sentiments to the compared entities. We propose Sentiment Analysis Enhanced COmparative Network (SAECON) which improves CPC accuracy with a sentiment analyzer that learns sentiments to individual entities via domain adaptive knowledge transfer. Experiments on the CompSent-19 (Panchenko et al., 2019) dataset present a significant improvement on the F1 scores over the best existing CPC approaches.


Introduction
Comparative Preference Classification (CPC) is a natural language processing (NLP) task that predicts whether a preference comparison exists between two entities in a sentence and, if so, which entity wins the game.For example, given the sentence: Python is better suited for data analysis than MATLAB due to the many available deep learning libraries, a decisive comparison exists between Python and MATLAB and comparatively Python is preferred over MATLAB in the context.
The CPC task can profoundly impact various real-world application scenarios.Search engine users may query not only factual questions but also comparative ones to meet their specific information needs (Gupta et al., 2017).Recommendation * These authors contributed equally.
providers can analyze product reviews with comparative statements to understand the advantages and disadvantages of the product comparing with similar ones.
Several models have been proposed to solve this problem.Panchenko et al. (2019) first formalize the CPC problem, build and publish the CompSent-19 dataset, and experiment with numerous general machine learning models such as Support Vector Machine (SVM), representation-based classification, and XGBoost.However, these attempts consider CPC as a sentence classification while ignoring the semantics and the contexts of the entities (Ma et al., 2020).
ED-GAT (Ma et al., 2020) marks the first entityaware CPC approach that captures long-distance syntactic relations between the entities of interest by applying graph attention networks (GAT) to dependency parsing graphs.However, we argue that the disadvantages of such an approach are clear.Firstly, ED-GAT replaces the entity names with "entityA" and "entityB" for simplicity and hence deprives their semantics.Secondly, ED-GAT has a deep architecture with ten stacking GAT layers to tackle the long-distance issue between compared entities.However, more GAT layers result in a heavier computational workload and reduced training stability.Thirdly, although the competing entities are typically connected via multiple hops of dependency relations, the unordered tokens along the connection path cannot capture either global or local high-quality semantic context features.
In this work, we propose a Sentiment Analysis Enhanced COmparative classification Network (SAECON), a CPC approach that considers not only syntactic but also semantic features of the entities.The semantic features here refer to the context of the entities from which a sentiment analysis model can infer the sentiments toward the entities.Specifically, the encoded sentence and entities are fed into a dual-channel context fea-ture extractor to learn the global and local context.In addition, an auxiliary Aspect-Based Sentiment Analysis (ABSA) module is integrated to learn the sentiments towards individual entities which are greatly beneficial to the comparison classification.
ABSA aims to detect the specific emotional inclination toward an aspect within a sentence (Ma et al., 2018;Hu et al., 2019;Phan and Ogunbona, 2020;Chen and Qian, 2020;Wang et al., 2020).For example, the sentence I liked the service and the staff but not the food suggests positive sentiments toward service and staff but a negative one toward food.These aspect entities, such as service, staff, and food, are studied individually.
The well-studied ABSA approaches can be beneficial to CPC when the compared entities in a CPC sentence are considered as the aspects in ABSA.Incorporating the individual sentiments learned by ABSA methods into CPC has several advantages.Firstly, for a comparison to hold, the preferred entity usually receives a positive sentiment while its rival gets a relatively negative one.These sentiments can be easily extracted by the strong ABSA models.The contrast between the sentiments assigned to the compared entities provides a vital clue for an accurate CPC.Secondly, the ABSA models are designed to target the sentiments toward phrases, which bypasses the complicated and noisy syntactic relation path.Thirdly, considering the scarcity of the data resource of CPC, the abundant annotated data of ABSA can provide sufficient supervision signal to improve the accuracy of CPC.
There is one challenge that blocks the knowledge transfer of sentiment analysis from the ABSA data to the CPC task: domain shift.Existing ABSA datasets are centered around specific topics such as restaurants and laptops, while the CPC data has mixed topics (Panchenko et al., 2019) that are all distant from restaurants.In other words, sentences of ABSA and CPC datasets are drawn from different distributions, also known as domains.The difference in the distributions is referred to as a "domain shift" (Ganin and Lempitsky, 2015;He et al., 2018) and it is harmful to an accurate knowledge transfer.To mitigate the domain shift, we design a domain adaptive layer to remove the domainspecific feature such as topics and preserve the domain-invariant feature such as sentiments of the text so that the sentiment analyzer can smoothly transfer knowledge from sentiment analysis to comparative classification.
2 Related Work

Comparative Preference Classification
CPC originates from the task of Comparative Sentence Identification (CSI) (Jindal and Liu, 2006).CSI aims to identify the comparative sentences.Jindal and Liu (2006) approach this problem by Class Sequential Mining (CSR) and a Naive Bayesian classifier.Building upon CSI, Panchenko et al. (2019) propose the task of CPC, release CompSent-19 dataset, and conduct experimental studies using traditional machine learning approaches such as SVM, representation-based classification, and XGBoost.However, they neglect the entities in the comparative context (Panchenko et al., 2019).ED-GAT (Ma et al., 2020), a more recent work, uses the dependency graph to better recognize longdistance comparisons and avoid falsely identifying unrelated comparison predicates.However, it fails to capture semantic information of the entities as they are replaced by "entityA" and "entityB".Furthermore, having multiple GAT layers severely increases training difficulty.
More recent works1 widely use complex contextualized NLP models such as BERT (Devlin et al., 2019).Sun et al. (2019) transform ABSA into a Question Answering task by constructing auxiliary sentences.Phan and Ogunbona (2020) build a pipeline of Aspect Extraction and ABSA and used wide and concentrated features for sentiment classification.ABSA is related to CPC by nature.In general, entities with positive sentiments are preferred over the ones with neutral or negative sentiments.Therefore, the performance of a CPC model can be enhanced by the ABSA techniques.
a F 2 z i p k T + A P r 8 w d r 9 I 7 r < / l a t e x i t > e 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " N k r a B z h s v e T P z P 6 2 U 6 v P Z y H q e Z x p g t F o W Z I D o h s + f J g E t k W k w M o U x y c y t h I y o p 0 y Y i 2 4 T g L r + 8 S t r 1 m n t Z q 9 9 f V B o 3 R R w l O I U z O A c X r q A B d 9 C E F j A Q 8 A y v 8 G Y 9 W i / W u / W x a F 2 z i p k T + A P r 8 w d r 9 I 7 r < / l a t e x i t > e 2 CPC task, repr. of < l a t e x i t s h a 1 _ b a s e 6 4 = " N k r a B z h t R / 1 Z q 4 v D 1 w e g T Q z B l O M = " > A A A B 6 3 i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 K k k R 9 V j 0 4 r G C / Y A 2 l M 1 2 0 i 7 d 3 Y T d j V B C / 4 I X D 4 p 4 9 Q 9 5 8 9 + Y t D l o 6 4 O B x 3 s z z M w L Y s G N d d 1 v Z 2 1 9 Y 3 N r u 7 R T 3 t 3 b P z i s H B 2 3 T Z R o h i 0 W i U h 3 A 2 p Q c I U t y 6 3 A b q y R y k B g J 5 j c 5 X 7 n C b X h k X q 0 0 x h 9 S U e K h 5 x R m 0 s 4 8 M q D S t W t u X O Q V e I V p A o F m o P K V 3 8 Y s U S i s k x Q Y 3 q e G 1 s / p d p y J n B W 7 i c G Y 8 o m d I S 9 j C o q 0 f j p / N Y Z O c + U I Q k j n Z W y Z K 7 + n k i p N G Y q g 6 x T U j s 2 y 1 4 u / u f 1 E h v e + C l X c W J R s c W i M B H E R i R / n A y 5 R m b F N C O U a Z 7 d S t i Y a s p s F k 8 e g r f 8 8 i p p 1 2 v e V a 3 + c F l t 3 B Z x l O A U z u A C P L i G B t x D E 1 r A Y A z P 8 A p v j n R e n H f n Y 9 G 6 5 h Q z J / A H z u c P J g 2 N q A = = < / l a t e x i t > e 1 CPC task, repr. of < l a t e x i t s h a 1 _ b a s e 6 4 = " ABSA task repr. of < l a t e x i t s h a 1 _ b a s e 6 4 = " p H o e p r g X 5 j N X 7 h q s 3 w = < / l a t e x i t > h l,1 < l a t e x i t s h a 1 _ b a s e 6 4 = " y 9 e 2 H 1 m t < l a t e x i t s h a 1 _ b a s e 6 4 = " w w N 0 1 h q + S q L f S + g 8 4 r H < l a t e x i t s h a 1 _ b a s e 6 4 = " 9 y X 9 2 g 7 Z E i u r e u u / 6 T q J E B C P t L P f f v O N d h 5 h L o V G 3 / 9 n 2 R s f P m 5 u b e 8 4 n z 5 / 2 f 3 q 7 u 3 3 d V Y o D j 2 e y U w N Q q Z B i h R 6 K F D C I F f A k l D C Q z i 7 q u M P j 6 C 0 y N J 7 n O c w T t g k F b H g D A 1 F 3 e c 2 0 M A x r m u c 0 x 4 l D K e c y f K i M o 8 w k 5 G e J + Y q p x U t 5 f e g k e 0 2 s L p R q 5 u 1 a B 8 5 x j + A H 3 + A P y r k j 8 = < / l a t e x i t > L c < l a t e x i t s h a 1 _ b a s e 6 4 = " Z 6 H X h z r 9 x 0 g a s r Q c G z j n 3 X u b e 4 y e M S m X b 3 8 b S 8 s r q 2 n p p w 9 z c 2

SAECON
In this section, we first formalize the problem and then explain SAECON in detail.The pipeline of SAECON is depicted in Figure 1 with essential notations.

Problem Statement
CPC Given a sentence s from the CPC corpus D c with n tokens and two entities e 1 and e 2 , a CPC model predicts whether there exists a preference comparison between e 1 and e 2 in s and if so, which entity is preferred over the other.Potential results can be Better (e 1 wins), Worse (e 2 wins), or None (no comparison exists).
ABSA Given a sentence s from the ABSA corpus D s with m tokens and one entity e , ABSA identifies the sentiment (positive, negative, or neutral) associated with e .
We denote the source domains of the CPC and ABSA datasets by D c and D s .D c and D s contain samples that are drawn from D c and D s , respectively.D c and D s are similar but different in topics which produces a domain shift.We use s to denote sentences in D c ∪ D s and E to denote the entity sets for simplicity in later discussion.|E| = 2 if s ∈ D c and |E| = 1 otherwise.

Text Feature Representations
A sentence is encoded by its word representations via a text encoder and parsed into a dependency graph via a dependency parser (Chen and Manning, 2014).Text encoder, such as GloVe (Pennington et al., 2014) and BERT (Devlin et al., 2019), maps a word w into a low dimensional embedding w ∈ R d 0 .GloVe assigns a fixed vector while BERT computes a token 2 representation by its textual context.The encoding output of s is denoted by S 0 = {w 1 , . . ., e 1 , . . ., e 2 , . . ., w n } where e i denotes the embedding of entity i, w i denotes the embedding of a non-entity word, and w i , e j ∈ R d 0 .
The dependency graph of s, denoted by G s , is obtained by applying a dependency parser to s such as Stanford Parser (Chen and Manning, 2014) or spaCy 3 .G s is a syntactic view of s (Marcheggiani and Titov, 2017;Li et al., 2016) that is composed of vertices of words and directed edges of dependency relations.Advantageously, complex syntactic relations between distant words in the sentence can be easily detected with a small number of hops over dependency edges (Ma et al., 2020).

Contextual Features for CPC
Global Semantic Context To model more extended context of the entities, we use a bidirectional LSTM (BiLSTM) to encode the entire sentence in both directions.Bi-directional recurrent neural network is widely used in extracting semantics (Li et al., 2019).Given the indices of e 1 and e 2 in s, the global context representations h g,1 and h g,2 are computed by averaging the hidden 2 BERT generates representations of wordpieces which can be substrings of words.If a word is broken into wordpieces by BERT tokenizer, the average of the wordpiece representations is taken as the word representation.The representations of the special tokens of BERT, [CLS] and [SEP], are not used.
Local Syntactic Context In SAECON, we use a dependency graph to capture the syntactically neighboring context of entities that contains words or phrases modifying the entities and indicates comparative preferences.We apply a Syntactic Graph Convolutional Network (SGCN) (Bastings et al., 2017;Marcheggiani and Titov, 2017) to G s to compute the local context feature h l,1 and h l,2 for e 1 and e 2 , respectively.SGCN operates on directed dependency graphs with three major adjustments compared with GCN (Kipf and Welling, 2017): considering the directionality of edges, separating parameters for different dependency labels4 , and applying edge-wise gating to message passing.GCN is a multilayer message propagation-based graph neural network.Given a vertex v in G s and its neighbors N (v), the vertex representation of v on the (j + 1)th layer is given as where ρ(•) denotes an aggregation function such as mean and sum, W (j) ∈ R d (j+1) ×d (j) and b (j) ∈ R d (j+1) are trainable parameters, and d (j+1) and d (j) denote latent feature dimensions of the (j + 1)th and the jth layers, respectively.SGCN improves GCN by considering different edge directions and diverse edge types, and assigns different parameters to different directions or labels.However, there is one caveat: the directionalitybased method cannot accommodate the rich edge type information; the label-based method causes combinatorial over-parameterization, increased risk of overfitting, and reduced efficiency.Therefore, we naturally arrive at a trade-off of using direction-specific weights and label-specific biases.
The edge-wise gating can select impactful neighbors by controlling the gates for message propagation through edges.The gate on the jth layer of an edge between vertices u and v is defined as where d uv and l uv denote the direction and label of edge (u, v), β duv and γ (j) luv are trainable parameters, and σ(•) denotes the sigmoid function.
Summing up the aforementioned adjustments on GCN, the final vertex representation learning is Vectors of S 0 serve as the input representations h to the first SGCN layer.The representations corresponding to e 1 and e 2 are the output {h l,1 , h l,2 } with dimension d l .

Sentiment Analysis with Knowledge
Transfer from ABSA We have discussed in Section 1 that ABSA inherently correlates with the CPC task.Therefore, it is natural to incorporate a sentiment analyzer into SAECON as an auxiliary task to take advantage of the abundant training resources of ABSA to boost the performance on CPC.There are two paradigms for auxiliary tasks: (1) incorporating fixed parameters that are pretrained solely with the auxiliary dataset; (2) incorporating the architecture only with untrained parameters and jointly optimizing them from scratch with the main task simultaneously (Li et al., 2018;He et al., 2018;Wang and Pan, 2018).
Option (1) ignores the domain shift between D c and D s , which degrades the quality of the learned sentiment features since the domain identity information is noisy and unrelated to the CPC task.SAECON uses option (2).For a smooth and efficient knowledge transfer from D s to D c under the setting of option (2), the ideal sentiment analyzer only extracts the textual feature that is contingent on sentimental information but orthogonal to the identity of the source domain.In other words, the learned sentiment features are expected to be discriminative on sentiment analysis but invariant with respect to the domain shift.Therefore, the sentiment features are more aligned with the CPC domain D c with reduced noise from domain shift.
In SAECON, we use a gradient reversal layer (GRL) and a domain classifier (DC) (Ganin and Lempitsky, 2015) for the domain adaptive sentiment feature learning that maintains the discriminativeness and the domain-invariance.GRL+DC is a straightforward, generic, and effective modification to neural networks for domain adaptation (Kamath et al., 2019;Gu et al., 2019;Belinkov et al., 2019;Li et al., 2018).It can effectively close the shift between complex distributions (Ganin and Lempitsky, 2015) such as D c and D s .
Let A denote the sentiment analyzer which alternatively learns sentiment information from D s and provides sentimental clues to the compared entities in D c .Specifically, each CPC instance is split into two ABSA samples with the same text before being fed into A (see the "Split to 2" in Figure 1).One takes e 1 as the queried aspect the other takes e 2 .
h s,1 , h s,2 , and h s ∈ R ds .These outputs are later sent through a GRL to not only the CPC and ABSA predictors shown in Figure 1 but also the DC to predict the source domain y d of s where y d = 1 if s ∈ D s otherwise 0. GRL, trainable by backpropagation, is transparent in the forward pass (GRL α (x) = x).It reverses the gradients in the backward pass as Here x is the input to GRL, α is a hyperparameter, and I is an identity matrix.During training, the reversed gradients maximize the domain loss, forcing A to forget the domain identity via the backpropagation and mitigating the domain shift.Therefore, the outputs of A stay invariant to the domain shift.But as the outputs of A are also optimized for ABSA predictions, the distinctiveness with respect to sentiment classification is retained.Finally, the selection of A is flexible as it is architecture-agnostic.In this paper, we use the LCF-ASC aspect-based sentiment analyzer proposed by Phan and Ogunbona (2020) in which two scales of representations are concatenated to learn the sentiments to the entities of interest.

Objective and Optimization
SAECON optimizes three classification errors overall for CPC, ABSA, and domain classification.For CPC task, features for local context, global context, and sentiment are concatenated: h e i = [h g,i ; h l,i ; h s,i ], i ∈ {1, 2}, and h e i ∈ R ds+dg+d l .Given F c , F s , F d , and F below denoting fullyconnected neural networks with non-linear activation layers, CPC, ABSA, domain predictions are obtained by where δ denotes the softmax function.With the predictions, SAECON computes the cross entropy losses for the three tasks as L c , L s , and L d , respectively.The label of L d is y d .The computations of the losses are omitted due to the space limit.In summary, the objective function of the proposed model SAECON is given as follows, where λ s and λ d are two weights of the losses, and λ is the weight of an L2 regularization.We denote λ = {λ s , λ d , λ}.In the actual training, we separate the iterations of CPC data and ABSA data and input batches from the two domains alternatively.Alternative inputs ensure that the DC receives batches with different labels evenly and avoid overfitting to either domain label.A stochastic gradient descent based optimizer, Adam (Kingma and Ba, 2015), is leveraged to optimize the parameters of SAECON.Algorithm 1 in Section A.1 explains the alternative training paradigm in detail.

Experimental Settings
Dataset CompSent-19 is the first public dataset for the CPC task released by Panchenko et al. (2019).It contains sentences with entity annotations.The ground truth is obtained by comparing the entity that appears earlier (e 1 ) in the sentence with the one that appears later (e 2 ).The dataset is split by convention (Panchenko et al., 2019;Ma et al., 2020) 1.
Three datasets of restaurants released in Se-mEval 2014, 2015, and 2016(Pontiki et al., 2014, 2015;Xenos et al., 2016) are utilized for the ABSA task.We join their training sets and randomly sample instances into batches to optimize the auxiliary objective L s .The proportions of POS, NEU, and NEG instances are 65.8%, 11.0%, and 23.2%.
Note.The rigorous definition of None in CompSent-19 is that the sentence does not contain a comparison between the entities rather than that Reproducibility The implementation of SAE-CON is publicly available on GitHub5 .Details for reproduction are given in Section A.2.
Baseline Models Seven models experimented in Panchenko et al. (2019) and the state-of-the-art ED-GAT (Ma et al., 2020)    Table 2: Performance comparisons between the proposed model and baselines on F1 scores (%)."-B" and "-G" denote different versions of the model using BERT (Devlin et al., 2019) and GloVe (Pennington et al., 2014) as the input embeddings, respectively.All reported improvements over the best baselines are statistically significant with p-value < 0.01.3.

Ablation Studies
Four points worth noting.Firstly, the SAECON with all modules achieves the best performance on three out of four metrics, demonstrating the effectiveness of all modules (SAECON vs. the rest); Secondly, the synergy of A and GRL improves the performance (−(A+GRL) vs. SAECON) whereas the A without domain adaptation hurts the classification accuracy instead (−GRL vs. SAECON), which indicates that the auxiliary sentiment analyzer is beneficial to CPC accuracy only with the assistance of GRL+DC modules; Thirdly, removing the global context causes the largest performance deterioration (−BiLSTM vs. SAECON), showing the significance of long-term information.This observation is consistent with the findings of Ma et al. (2020) 3: Ablation studies on F1 scores (%) between SAECON and its variants with modules disabled."−X" denotes a variant without module X. Removing A will also remove GRL+DC.
Hyperparameter Searching We demonstrate the influences of several key hyperparameters.Such hyperparameters include the initial learning rate (LR, η), feature dimensions (d = {d g , d l , d s }), regularization weight λ, and the configurations of SGCN such as directionality, gating, and layer numbers.
For LR, d, and λ in Figures 2a, 2b, and 2c, we can observe a single peak for F1(W) (green curves) and fluctuating F1 scores for other labels and the micro-F1 (blue curves).In addition, the peaks of micro-F1 occur at the same positions of F1(W).This indicates that the performance on Worse is the most influential factor to the micro-F1.These observations help us locate the optimal settings and Figure 2d focuses on the effect of SGCN layer numbers.We observe clear oscillations on F1(W) and find the best scores at two layers.More layers of GCN result in oversmoothing (Kipf and Welling, 2017) and hugely downgrade the accuracy, which is eased but not entirely fixed by the gating mechanism.Therefore, the performances slightly drop on larger layer numbers.
Table 4 shows the impact of directionality and gating.Turning off either the directionality or the gating mechanism ("" or "") leads to degraded F1 scores.SGCN without modifications ("") drops to the poorest micro-F1 and F1(W).Although its F1(N) is the highest, we hardly consider it a good sign.Overall, the benefits of the directionality and gating are verified.Alleviating Data Imbalance The label imbalance severely impairs the model performance, especially on the most underpopulated label Worse.
The aforementioned imbalance alleviation methods are tested in Table 5.The Original (OR) row is a control experiment using the raw CompSent-19 without any weighting or augmentation.The optimal solution is the weighted loss (WL vs. the rest).One interesting observation is that data augmentation such as flipping labels and upsampling cannot provide a performance gain (OR vs. FL and OR vs. UP).Weighted loss performs a bit worse on F1(N) but consistently better on the other metrics, especially on Worse, indicating that it effectively alleviates the imbalance issue.In practice, static weights found via grid search are assigned to different labels when computing the cross entropy loss.We leave the exploration of dynamic weighting methods such as the Focal Loss (Lin et al., 2017)  Alternative Training One novelty of SAECON is the alternative training that allows the sentiment analyzer to learn both tasks across domains.Here we analyze the impacts of different batch ratios (BR) and different domain shift handling methods during the training.BR controls the number of ratio of batches of the two alternative tasks in each training cycle.For example, a BR of 2 : 3 sends 2 CPC batches followed by 3 ABSA batches in each iteration.
Figure 3a presents the entire training time for ten epochs with different BR.A larger BR takes shorter time.For example, a BR of 1:1 (the leftmost bar) takes a shorter time than 1:5 (the yellow bar).Figure 3b presents the micro-F1 scores for different BR.We observe two points: (1) The reported performances differ slightly; (2) Generally, the performance is better when the CPC batches are less than ABSA ones.Overall, the hyperparameter selection tries to find a "sweet spot" for effectiveness and efficiency, which points to the BR of 1:1.As a result, SAECON achieves the best performance especially on F1(W), −GRL gets the second, and the "option (1)" gets the worst.These demonstrate that (1) the alternative training (blue vs. green) for an effective domain adaptation is necessary and (2) there exists a positive correlation between the level of domain shift mitigation and the model performance, especially on F1(W) and F1(B).A better domain adaptation produces higher F1 scores in the scenarios where datasets in the domain of interest, i.e., CPC, is unavailable.

Case Study
In this section, we qualitatively exemplify the contribution of the sentiment analyzer A. Table 6 reports four example sentences from the test set of CompSent-19.The entities e 1 and e 2 are highlighted together with the corresponding sentiment predicted by A. The column "Label" shows the ground truth of CPC.The "∆" column computes the sentiment distances between the entities.We assign +1, 0, and −1 to sentiment polarities of CPC sentences with sentiment predictions by A Label ∆ S1: This is all done via the gigabit [Ethernet:POS] interface, rather than the much slower [USB:NEG] interface.Better +2 S2: Also, [Bash:NEG] may not be the best language to do arithmetic heavy operations in something like [Python:NEU] might be a better choice.POS, NEU, and NEG, respectively.∆ is computed by the sentiment polarity of e 1 minus that of e 2 .Therefore, a positive distance suggests that e 1 receives a more positive sentiment from A than e 2 and vice versa.In S1, sentiments to Ethernet and USB are predicted positive and negative, respectively, which can correctly imply the comparative label as Better.S2 is a Worse sentence with Bash predicted negative, Python predicted neutral, and a resultant negative sentiment distance −1.For S3 and S4, the entities are assigned the same polarities.Therefore, the sentiment distances are both zeros.We can easily tell that preference comparisons do not exist, which is consistent with the ground truth labels.Due to the limited space, more interesting case studies are presented in Section A.4.

Conclusion
This paper proposes SAECON, a CPC model that incorporates a sentiment analyzer to transfer knowledge from ABSA corpora.Specifically, SAECON utilizes a BiLSTM to learn global comparative features, a syntactic GCN to learn local syntactic information, and a domain adaptive auxiliary sentiment analyzer that jointly learns from ABSA corpora and CPC corpora for a smooth knowledge transfer.An alternative joint training scheme enables the efficient and effective information transfer.Qualitative and quantitative experiments verified the superior performance of SAECON.For future work, we will focus on a deeper understanding of CPC data augmentation and an exploration of weighting loss methods for data imbalance.

Broader Impact Statement
This section states the broader impact of the proposed CPC model.Our model is designed specifically for the comparative classification scenarios in NLP.Users can use our model to detect whether a comparison exists between two entities of interest within the sentences of a particular sentential corpus.For example, a recommender system equipped with our model can tell whether a product is compared with a competitor and, further, which is preferred.In addition, Review-based platforms can utilize our model to decide which items are widely welcomed and which are not.
Our model, SAECON, is tested the CompSent-19 dataset which has been anonymized during the process of collection and annotation.For the auxiliary ABSA task, we also use an anonymized public dataset from SemEval 2014 to 2016.Therefore, our model will not cause potential leakages of user identity and privacy.We would like to call for attention to CPC as it is a relatively new task in NLP research.
R 6 a l 2 b b L L S m U e Q c k j 9 m P n w D h / D H y w = = < / l a t e x i t > S 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " Y Z z Z 4 4 2 x g e K g u s p v S z d s F 4 a y k / a 5 r t d m E S s p Q D 7 G i m + A u f / k / e G 0 2 3 L N G 8 6 l V a 9 8 W 7 d g k R + S Y n B C X n J M 2 u S e P p E O Y U T V a x p V x b R 6 a l 2 b b L L S m U e Q c k j 9 m P n w D h / D H y w = = < / l a t e x i t > S 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " 5 J l 0 a 5 r t d m E S s p Q D 7 G i m + A u f / k / e G 0 2 3 L N G 8 6 l V a 9 8 W 7 d g k R + S Y n B C X n J M 2 u S e P p E O Y U T V a x p V x b R 6 a l 2 b b L L S m U e Q c k j 9 m P n w D h / D H y w = = < / l a t e x i t > S 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " d M k f g C n H z b d O f w 6 2 e c 2 P / Z 5 J H M Y = " > A A A C p X i c b V F L b 9 N A E F 6 7 0 K Y G S l q O v V i J L D i g y I 4 q 2 m O A A z 2 A 1 E B e U h K t 1 p t x s s r 6 o d 0 x I r L 8 z / g V 3 P g 3 r B M L Q e O R d v b b b 7 7 R z i P M p N D o + 7 8 t + + T J 0 9 O z 1 r n z 7 P m L i 5 f t y 6 u J T n P F Y c x T m a p Z y D R I k n s e Q I J d M 6 3 n g Z 7 g s m E L B J Z T O I t e Q M b 5 l a 5 g b m L A Y 9 L L Y T 7 l 0 P c O s 3 C h V 5 i T o 7 t l / M w o W 6 6 o D o 6 y K 1 I 9 j F d k U m + c Y 3 S 0 L k W Q 5 Q s I P H 0 W 5 d D F 1 q 5 W 5 K 6 G A o 9 w Z w L g S p l a X b 5 h i H M 1 i H T O E 4 H H L x 2 D S 7 w X v e v 3 h T X f w o R 5 H i 1 y T D n l D A n J L B u S e P J A x 4 V b H u r e G 1 l f 7 t f 3 F H t m T g 9 S 2 6 p x X 5 D + z 6 R / J Z c 9 W < / l a t e x i t > L s < l a t e x i t s h a 1 _ b a s e 6 4 = " D l u B z d I C H p 0 w x j m a x j h l C 8 L 7 l d d D v d o I f n e 7 d S e v 8 c j W O b X J A D s k x C c g p O S f X 5 J b 0 C L c O r W v r z v p t H 9 k 3 9 r 3 d X 0 p t a 5 X z j b w x m / 4 H s q n P R w = = < / l a t e x i t > L d < l a t e x i t s h a 1 _ b a s e 6 4 = " O N G 7 f 0 T B L 2 + g o O 5 O J j w S U v o 2p t g = " >A A A B 9 H i c b V D L S g M x F L 2 p r 1 p f V Z d u g k V w V W a K q M u i G x c u K t g H t E P J p J k 2 N J M Zk 0 y h D P 0 O N y 4 U c e v H u P N v z L S z 0 N Y D g c M 5 9 3 J P j h 8 L r o 3 j f K P C 2 v r G 5 l Z x u 7 S z u 7 d / U D 4 8 a u k o a X T K y Z 2 W S s 9 u x 8 J / b i y p + r v i R S F U o 7 D Q H d m S 8 p 5 L x P / 8 z q J 6 p 9 7 K e V x o g j H s 4 / 6 C b N V Z G e R 2 D 0 q C F Z s r A n C g u p d b T x E A m G l g z N 1 C O 7 8 y Y u k W a 2 4 p 5 X q 7 U m p d p H H U Y Q D O I J j c O E M a n A N d W g A h g d 4 g h d 4 N R 6 N Z + P N e J + 1 F o x 8 Z h / + w P j 4 B j x N m G s = < / l a t e x i t > F c < l a t e x i t s h a 1 _ b a s e 6 4 = " H

Figure 1 :
Figure 1: Pipeline of SAECON.Sentences from two domains shown in the gray box are fed into text encoder and dependency parser.The resultant representations of the entities from three substructures are discriminated by different colors (see the legend in the corner)."Cls." is short for classifier.
Ablation studies demonstrate the unique contribution of each part of the proposed model.Here we verify the contributions of the following modules: (1) The bi-directional global context extractor (BiLSTM); (2) The syntactic local context extractor (SGCN); (3) The domain adaptation modules of A (GRL); (4) The entire auxiliary sentiment analyzer, including its depen-dent GRL+DC (A+GRL for short).The results are presented in Table Figure 2: Searching and sensitivity for four key hyperparameters of SAECON in F1 scores.

Figure 4 :
Figure 4: Visualization of the domain shift mitigation.

Worse − 1 S3:S4:
It shows how [JavaScript:POS] and [PHP:POS] can be used in tandem to make a user's experience faster and more pleasant.None 0 He broke his hand against [Georgia Tech:NEU] and made it worse playing against [Virginia Tech:NEU].None 0Table 6: Case studies for the effect of the sentiment analyzer A (see Section 4.3 for details).
: 80% for training and 20% for testing.During training, 20% of the training data of each label composes the development set for model selection.The detailed statistics are given in Table

Table 1 :
Statistics of CompSent-19.The rows of Flipping labels and Upsampling show the numbers of the augmented datasets to mitigate label imbalance.Imbalanced Data CompSent-19 is badly imbalanced (see Table1).None instances dominate in the dataset.The other two labels combined only account for 27%.This critical issue can impair the model performance.Three methods to alleviate the imbalance are tested.Flipping labels: Consider the order of the entities, an original Better instance will become a Worse one and vice versa if querying (e 2 , e 1 ) instead.We interchange the e 1 and e 2 of all Better and Worse samples so that they have the same amount.Upsampling: We upsample Better and Worse instances with duplication to the same amount of None.Weighted loss: We upweight the underpopulated labels Better and Worse when computing the classification loss.Their effects are discussed in Section 4.2.

Table
in which eight to ten stacking GAT layers are used for global feature learning; Finally, the performances also drop after removing the SGCN (−SGCN vs. SAECON) but the drop is less than removing the BiLSTM.Therefore, local context plays a less important role than the global context (−SGCN vs. −BiLSTM).

Table 4 :
Searching and sensitivity for the directionality and gating of SGCN by F1 scores (%).

Table 5 :
for future work.Performance analysis with F1 scores (%) for different methods to mitigate data imbalance.