Sentiment Analysis using the Relationship between Users and Products

In product reviews, user and product aspects are useful in sentiment analysis. Neverthe-less, previous studies mainly focus on modeling user and product aspects without considering the relationship between users and products. The relationship between users and products is typically helpful in estimating the bias of a user toward a product. In this paper, we, therefore, introduce the Graph Neural Network-based model with the pre-trained Language Model (GNNLM), where the relationship between users and products is incorporated. We conducted experiments on three well-known benchmarks for sentiment classification with the user and product information. The experimental results show that the relationship between users and products improves the performance of sentiment analysis. Furthermore, GNNLM achieves state-of-the-art results on yelp-2013 and yelp-2014 datasets.


Introduction
Sentiment analysis aims to understand a user's opinion toward a product.It is to infer the sentiment polarity or intensity on a review of a document (Pang et al., 2008;Liu, 2012).Recently, user and product information in a review has been proven to be helpful for sentiment analysis models (Tang et al., 2015).Consequently, many studies investigate how to model user and product aspects and incorporate them into deep neural network models.
Nevertheless, none of them focuses on the relationship between users and products.This relationship between users and products typically provides the bias of a user's sentiment toward a product.For example, users A and B share similar sentiments on many products.If there is a product for which we do not know user A's sentiment, but we know user B's sentiment, we might be able to infer user A's sentiment from user B's sentiment.In addition, if a user has a high expectation toward the product, but the product does not meet the expectation, it would greatly impact the user's sentiment.Meanwhile, the interaction between users and products has proven to be useful in other tasks, such as spam detection (Wang et al., 2012) and citation recommendation (Jeong et al., 2020;Bhowmick et al., 2021).Based on these observations, we assume that the relationship between users and products could provide a clue to help sentiment analysis.
In this paper, we, therefore, propose a new approach using graph neural networks with the pre-trained language model, namely GNNLM.In GNNLM, the relationship between the user and the product is captured by the graph neural network model as distributed representations and then combined with a distributed representation of reviews obtained from a pre-trained language model to predict the sentiment label.We conduct experiments on three benchmarks (IMDB, Yelp-2013, andYelp-2014) for sentiment classification with the user and product information.The results show that combining the relationship between the user and the product could help improve the performance of the sentiment analysis model.

Related Work
Recent studies have shown that user and product information is useful for sentiment analysis.The first study (Tang et al., 2015) argues that user and product information are consistent with a sentiment from a review.They propose UPNN that incorporates the user and product preference matrix into a CNN-based model to modify the meaning of word representation.UPDMN (Dou, 2017) uses a deep memory network to capture the user and product preferences with the LSTM-based model.NSC (Chen et al., 2016) is the model using a hierarchical neural network with the attention mechanism to capture global user and product information.HCSC (Amplayo et al., 2018) investigates the cold start problem for sentiment analysis with the user and product information by introducing shared user and product representations.DUPMN (Long et al., 2018) uses a hierarchical LSTM-based model to encode the document with dual memory networks, one for user information and the other for production information.CMA (Ma et al., 2017) encodes the document using a hierarchical LSTM-based model, in which user and product information are injected hierarchically.BiLSTM + basis-cust (Kim et al., 2019) is a model that combines categorical metadata of users and products into the neural network model.CHIM (Amplayo, 2019) utilizes chunk-wise matrices to represent the user and product aspects and injects them into different locations of the model.IUPC (Lyu et al., 2020) is a model built on stacked attention with BERT to memorize historical reviews of a user and all reviews of a product.MA-BERT (Zhang et al., 2021) is a multiattribute BERT, where user and product aspects are incorporated into the BERT model.
Based on our survey, none of them investigates the relationship between users and products for sentiment analysis.

Our Approach
As shown in Fig. 1, our approach, GNNLM, consists of three components: 1) Graph neural networks, 2) Pre-trained language model, and 3) Classification layer.The task definition and the details of each component are described as follows.

Task Definition
Sentiment analysis with user and product information is a task to predict the intensity of the polarity of a review using text, user, and product information.The task is defined as follows.Given U = {u 1 , u 2 , u 3 , ..., u n }, P = {p 1 , p 2 , p 3 , ..., p m } and R are the set of users, products, and reviews respectively, and a user u x ∈ U writes a review r ux,py ∈ R about the product p y ∈ P , and r is a review represented by d sentences {s 1 , s 2 , s 3 ..., s d } and, the i-th sentence s i consists of l i word as {w 1 , w 2 , w 3 , ...w l i }, the objective of the task is to model the function f : (r ux,py , u where η is the polarity scale of the review r ux,py in the Likert scale from 1 to K, and K is the number of polarity classes.

Graph Neural Networks
Graph Neural Networks (GNNs) are neural models that can capture the dependency between nodes in a graph via message passing (Zhou et al., 2020).Recently, GNNs have been shown effective for various graph-related applications, e.g., Link Prediction (Zhang and Chen, 2018), due to their ability to learn structural information from the graph.In our study, we build the user-product graph and use GNNs to learn structural information representing the relationship between users and products.
In our task, there are two types of nodes: user and product.The user-product graph is defined as the heterogeneous graph G = (V U ∪ V P , E), where V U , V P , and E are the set of user nodes, product nodes, and edges between users and products.All users in U and products in P are used to create user and product nodes.For edges, if user u x writes a review about the product p y , there are two edges: (v ux , v py ) and (v py , v ux ), where v ux ∈ V U and v py ∈ V P .To avoid leaking the structural information between users and products, we only use the training set to build the graph G.
To learn representations of users and products, we use GraphSAGE (Hamilton et al., 2017) as the graph neural network operator to aggregate the structure information of the graph G.One advantage of GraphSAGE is that it can leverage the topological structure of neighbor nodes to learn and generalize embeddings of unseen nodes.Formally, the representation of nodes in the graph G is computed as follows: where aggregate(•) is the function to aggregate information from neighbor nodes, σ(•) is the activation function, N v is a set of all neighbor nodes of the node v, W i is a set of weight matrices used to propagate information between different layers, and h i v is the representation of the node v at the i-th layer.By computing representations of all nodes, we could encode the relationship between the user and the product as the vector representation.

Pre-trained Language Model
Pre-trained language models, such as BERT (Devlin et al., 2019) and RoBERTa (Liu et al., 2019), can achieve remarkable performance for many NLP tasks by the fine-tuning method.In our study, we use the pre-trained language model to learn the representation of a review.Using a word piece tokenizer (Wu et al., 2016), the review r ux,py can be represented as a sequence of tokens c ru x,py = {[CLS], w s1 1 , w s 1 2 , ..., w s 2 1 , ..., w s d l d }, where [CLS] is a special token representing the whole sequence.To obtain the representation of review r ux,py , we feed the sequence c ru x,py into the pre-trained language model as follows.
where f LM (•) is the pre-trained language model, and θ LM is its trainable parameters initialized from the pre-trained language model checkpoint.

Classification Layer
The classification layer is the final layer that combines the representation of the review r ux,py with the representations of the user u x and the product p y to predict the intensity of the polarity.In the classification layer, the representations of r ux,py , u x , and p y are concatenated and then passed into a feed-forward neural network with a rectified linear unit (ReLU ) function to project them into the target space of polarity classes.The classification layer can be defined as: where h cls is the representation of review r ux,py from the pre-trained language model, h ux and h py are the representations of user u x and product p y from GNNs, W K and b K are the parameters of the neural network.Then, the softmax function in Eq. 5 is used to normalize the polarity distribution.
where K is the number of polarity classes.
To learn and optimize our model, we use a crossentropy loss function defined as follows: where y r,i represents agreement with the groundtruth.Its value is 1 if the gold polarity class of the review r is i; otherwise 0.

Experimental Setup
Setting.The experimental setting follows the same setting in the study (Tang et al., 2015).In the setting, there are three benchmarks: IMDB, Yelp-2013, and Yelp-2014.The evaluation metrics are accuracy (Acc), and root mean squared error (RMSE).
Implementation.In GNNLM, we implement GNNs by using SAGEConv (Hamilton et al., 2017) and the pre-trained language model by using the RoBERTa (Liu et al., 2019) from Huggingface (Wolf et al., 2020).Note that in our preliminary experiment using the pre-trained language models, we were unable to reproduce the results for BERT as reported in (Lyu et al., 2020;Zhang et al., 2021) on the IMDB dataset.However, we could achieve comparable results as presented in (Lyu et al., 2020) by utilizing RoBERTa.To ensure fairness in the evaluation, we therefore selected RoBERTa as the pre-trained language model.The dimension of each node in GNNs and the dimension of hidden representations of RoBERTa are 768.The maximum sequence length of RoBERTa is 512.The AdamW optimizer (Loshchilov and Hutter, 2017) is used with the learning rate set at 2e-5.The batch size is set to 32.In the fine-tuning process, the model is trained up to 10 epochs on the training set.We select the best hyper-parameters from the dev set for evaluation in the test set.The source code and the setting for the experiments are available on the GitHub repository. 1 While we can simply fine-tune the pre-trained language model, the user and product representations from GNNs are randomly initialized and needs to be trained from scratch.To better learn the user and product representations before combing them, we train GNNLM with only GNNs for 100 epochs on the training set and save it as the GNNs checkpoint.In the fine-tuning process, the RoBERTa checkpoint and GNNs checkpoint are loaded to initialize the models.

Result and Discussion
The experimental results are listed in Table 1.Considering our variations of GNNLM models, we found that GNNLM outperforms GNNLM-GNNs and GNNLM-LM.It infers that the representation learned from the relationship between users and products could help improve the performance of sentiment analysis.GNNLM-GNNs mostly achieves better results than Majority.Majority could be considered as the heuristic approach using the majority polarity between users and products.From the results, GNNLM-GNNs could encode structural information, which is more useful than the majority polarity between users and products.Nonetheless, GNNLM-GNNs could suffer from the sparsity problem.The density of the user-product graph on IMDB, Yelp-2013, and Yelp-2014 is 0.06, 0.05, and 0.02.The graph in Yelp-2014 is sparser than the others.This sparsity problem could be the reason for no improvement in RMSE of GNNLM-GNNs compared with Majority.To further study the impact of the sparsity problem, we analyze the results based on the degree of a node in the graph.We found that nodes with lower degrees tend to provide lower performance.Therefore, the sparsity impacts the performance of GNNLM-GNNs.
Comparing our GNNLM with the systems on the leaderboard, we found that GNNLM could achieve the best performance on the Yelp-2013 and Yelp-2014 datasets.For the IMDB dataset, GNNLM could outperform most systems, except for MA-BERT in both metrics and CHIM, ISAR in the Acc metric.GNNLM could not surpass MA-BERT due to the performance of the base model.GNNLM-LM, BERT (IUPC), and BERT (MA-BERT) are pre-trained language models without the user and product information.On the Yelp-2013 and Yelp-2014 datasets, the performances of these approaches are comparable; however, on the IMDB dataset, BERT (MA-BERT) significantly outperforms GNNLM-LM and BERT (IUPC).Therefore, the large difference in the base model's performance could be the main reason for the gap between GNNLM and MA-BERT on the IMDB dataset.

Conclusion
This paper introduces GNNLM, GNNs with the pre-trained language model for sentiment analysis with user and product information.Unlike previous studies, we incorporate the relationship between users and products into the model using GNNs.Experimental results show that the representations learned from the relationship between users and products contribute to sentiment analysis models.
In the future, we will attempt to model user and product aspects from reviews into the graph.

Limitations
Our approach relies on the pre-trained language model performance.Although using a graph neural network with the user-product graph helps improve the performance in sentiment analysis, the pre-trained language model still plays an important role in the task.If the pre-trained language model cannot obtain good results, it will affect the performance as discussed on the IMDB dataset.
Furthermore, the graph density could affect the performance of GNNLM-GNNs, as discussed in the experimental results.Since GNNLM is built on top of GNNLM-GNNs, GNNLM is also affected by the sparsity problem.As already reported, the density of the user-product graph on the IMDB, Yelp-2013, and Yelp-2014 datasets are 0.06, 0.05, and 0.02, respectively.The greater the value is, the denser the graph is.Comparing GNNLM with GNNLM-LM, we found that the improvements we could obtain on the IMDB, Yelp-2013, and Yelp-2014 datasets are 6.1, 5.0, and 4.8, respectively.The trend of improvement conforms with the density of the graph.Therefore, if the user-product graph is very sparse, it would greatly affect the performance of GNNLM.B3.Did you discuss if your use of existing artifact(s) was consistent with their intended use, provided that it was specified?For the artifacts you create, do you specify intended use and whether that is compatible with the original access conditions (in particular, derivatives of data accessed for research purposes should not be used outside of research contexts)?Not applicable.Left blank.
B4. Did you discuss the steps taken to check whether the data that was collected / used contains any information that names or uniquely identifies individual people or offensive content, and the steps taken to protect / anonymize it?Not applicable.Left blank.
B5. Did you provide documentation of the artifacts, e.g., coverage of domains, languages, and linguistic phenomena, demographic groups represented, etc.?Not applicable.Left blank.
B6. Did you report relevant statistics like the number of examples, details of train / test / dev splits, etc. for the data that you used / created?Even for commonly-used benchmark datasets, include the number of examples in train / validation / test splits, as these provide necessary context for a reader to understand experimental results.For example, small differences in accuracy on large test sets may be significant, while on small test sets they may not be.Experimental Setup C Did you run computational experiments?

Experimental Setup
C1. Did you report the number of parameters in the models used, the total computational budget (e.g., GPU hours), and computing infrastructure used?Not applicable.Left blank.
you describe the limitations of your work?Limitations Section A2.Did you discuss any potential risks of your work?Not applicable.Left blank.A3.Do the abstract and introduction summarize the paper's main claims?Abstract and Introduction A4.Have you used AI writing assistants when working on this paper?Left blank.B Did you use or create scientific artifacts?Experimental Setup B1.Did you cite the creators of artifacts you used?Experimental Setup B2.Did you discuss the license or terms for use and / or distribution of any artifacts?Not applicable.Left blank.