Adversarial Learning of Poisson Factorisation Model for Gauging Brand Sentiment in User Reviews

In this paper, we propose the Brand-Topic Model (BTM) which aims to detect brand-associated polarity-bearing topics from product reviews. Different from existing models for sentiment-topic extraction which assume topics are grouped under discrete sentiment categories such as ‘positive’, ‘negative’ and ‘neural’, BTM is able to automatically infer real-valued brand-associated sentiment scores and generate fine-grained sentiment-topics in which we can observe continuous changes of words under a certain topic (e.g., ‘shaver’ or ‘cream’) while its associated sentiment gradually varies from negative to positive. BTM is built on the Poisson factorisation model with the incorporation of adversarial learning. It has been evaluated on a dataset constructed from Amazon reviews. Experimental results show that BTM outperforms a number of competitive baselines in brand ranking, achieving a better balance of topic coherence and unique-ness, and extracting better-separated polarity-bearing topics.


Introduction
Market intelligence aims to gather data from a company's external environment, such as customer surveys, news outlets and social media sites, in order to understand customer feedback to their products and services and to their competitors, for a better decision making of their marketing strategies. Since consumer purchase decisions are heavily influenced by online reviews, it is important to automatically analyse customer reviews for online brand monitoring. Existing sentiment analysis models either classify reviews into discrete polarity categories such as 'positive', 'negative' or 'neural', or perform more fine-grained sentiment analysis, in which aspect-level sentiment label is predicted, though still in the discrete polarity category space. We argue that it is desirable to be able to detect subtle topic changes under continuous sentiment scores. This allows us to identify, for example, whether customers with slightly negative views share similar concerns with those holding strong negative opinions; and what positive aspects are praised by customers the most. In addition, deriving brand-associated sentiment scores in a continuous space makes it easier to generate a ranked list of brands, allowing for easy comparison.
Existing studies on brand topic detection were largely built on the Latent Dirichlet Allocation (LDA) model (Blei et al., 2003) which assumes that latent topics are shared among competing brands for a certain market. They however are not able to separate positive topics from negative ones. Approaches to polarity-bearing topic detection can only identify topics under discrete polarity categories such as 'positive' and 'negative'. We instead assume that each brand is associated with a latent real-valued sentiment score falling into the range of [−1, 1] in which −1 denotes negative, 0 being neutral and 1 positive, and propose a Brand-Topic Model built on the Poisson Factorisation model with adversarial learning. Example outputs generated from BTM are shown in Figure 1 in which we can observe a transition of topics with varying topic polarity scores together with their associated brands.
More concretely, in BTM, a document-word count matrix is factorised into a product of two positive matrices, a document-topic matrix and a topic-word matrix. A word count in a document is assumed drawn from a Poisson distribution with its rate parameter defined as a product of a documentspecific topic intensity and its word probability under the corresponding topic, summing over all topics. We further assume that each document is associated with a brand-associated sentiment score and a latent topic-word offset value. The occurrence count of a word is then jointly determined Figure 1: Example topic results generated from proposed Brand-Topic Model. We observe a transition of topics with varying topic polarity scores. Besides the change of sentiment-related words (e.g., 'problem' in negative topics and 'better' in positive topics), we could also see a change of their associated brands.
Users are more positive about BRAUN, negative about REMINGTON, and have mixed opinions on NORELCO.
by both the brand-associated sentiment score and the topic-word offset value. The intuition behind is that if a word tends to occur in documents with positive polarities, but the brand-associated sentiment score is negative, then the topic-word offset value will have an opposite sign, forcing the occurrence count of such a word to be reduced. Furthermore, for each document, we can sample its word counts from their corresponding Poisson distributions and form a document representation which is subsequently fed into a sentiment classifier to predict its sentiment label. If we reverse the sign of the latent brand-associated sentiment score and sample the word counts again, then the sentiment classifier fed with the resulting document representation should generate an opposite sentiment label.
Our proposed BTM is partly inspired by the recently developed Text-Based Ideal Point (TBIP) model (Vafa et al., 2020) in which the topic-specific word choices are influenced by the ideal points of authors in political debates. However, TBIP is fully unsupervised and when used in customer reviews, it generates topics with mixed polarities. On the contrary, BTM makes use of the document-level sentiment labels and is able to produce better separated polarity-bearing topics. As will be shown in the experiments section, BTM outperforms TBIP on brand ranking, achieving a better balance of topic coherence and topic uniqueness measures.
The contributions of the model are three-fold: • We propose a novel model built on Poisson Factorisation with adversarial learning for brand topic analysis which can disentangle the sentiment factor from the semantic latent representations to achieve a flexible and controllable topic generation; • We approximate word count sampling from Poisson distributions by the Gumbel-Softmaxbased word sampling technique, and construct document representations based on the sampled word counts, which can be fed into a sentiment classifier, allowing for end-to-end learning of the model; • The model, trained with the supervision of review ratings, is able to automatically infer the brand polarity scores from review text only.
The rest of the paper is organised as follows. Section 2 presents the related work. Section 3 describes our proposed Brand-Topic Model. Section 4 and 5 discusses the experimental setup and evaluation results, respectively. Finally, Section 5 concludes the paper and outlines the future research directions.

Related Work
Our work is related to the following research: Poisson Factorisation Models Poisson factorisation is a class of non-negative matrix factorisation in which a matrix is decomposed into a product of matrices. It has been used in many personalise application such as personalised budgets recommendation (Guo et al., 2017), ranking (Kuo et al., 2018), or content-based social recommendation (Su et al., 2019;de Souza da Silva et al., 2017). Poisson factorisation can also be used for topic modelling where a document-word count matrix is factorised into a product of two positive matrices, a document-topic matrix and a topic-word matrix (Gan et al., 2015;Jiang et al., 2017). In such a setup, a word count in a document is assumed drawn from a Poisson distribution with its rate parameter defined as a product of a document-specific topic intensity and its word probability under the corresponding topic, summing over all topics.
Polarity-bearing Topics Models Early approaches to polarity-bearing topics extraction were built on LDA in which a word is assumed to be generated from a corpus-wide sentiment-topicword distributions (Lin and He, 2009). In order to be able to separate topics bearing different polarities, word prior polarity knowledge needs to be incorporated into model learning. In recent years, the neural network based topic models have been proposed for many NLP tasks, such as information retrieval (Xie et al., 2015), aspect extraction (He, 2017) and sentiment classification (He et al., 2018). Most of them are built upon Variational Autoencode (VAE) (Kingma and Welling, 2014) which constructs a neural network to approximate the topic-word distribution in probabilistic topic models (Srivastava and Sutton, 2017;Sønderby et al., 2016;Bouchacourt et al., 2018). Intuitively, training the VAE-based supervised neural topic models with class labels (Chaidaroon and Fang, 2017;Huang et al., 2018;Gui et al., 2020) can introduce sentiment information into topic modelling, which may generate better features for sentiment classification.

Market/Brand Topic Analysis
The classic LDA can also be used to analyse market segmentation and brand reputation in various fields such as finance and medicine (Barry et al., 2018;Doyle and Elkan, 2009). For market analysis, the model proposed by Iwata et al. (2009) used topic tracking to analyse customers' purchase probabilities and trends without storing historical data for inference at the current time step. Topic analysis can also be combined with additional market information for recommendations. For example, based on user profiles and item topics, Gao et al. (2017) dynamically modelled users' interested items for recommendation. Zhang et al. (2015) focused on brand topic tracking. They built a dynamic topic model to analyse texts and images posted on Twitter and track competitions in the luxury market among given brands, in which topic words were used to identify recent hot topics in the market (e.g. Rolex watch) and brands over topics were used to identify the market share of each brand.
Adversarial Learning Several studies have explored the application of adversarial learning mechanics to text processing for style transferring (John et al., 2019), disentangling representations (John et al., 2019) and topic modelling (Masada and Takasu, 2018). In particular, Wang et al. (2019) has proposed an Adversarial-neural Topic Model (ATM) based on the Generative Adversarial Network (GAN) (Goodfellow et al., 2014), that employees an adversarial approach to train a generator network producing word distributions indistinguishable from topic distributions in the train-ing set. (Wang et al., 2020) further extended the ATM model with a Bidirectional Adversarial Topic (BAT) model, using a bidirectional adversarial training to incorporate a Dirichlet distribution as prior and exploit the information encoded in word embeddings. Similarly, (Hu et al., 2020) builds on the aforementioned adversarial approach adding cycle-consistent constraints.
Although the previous methods make use of adversarial mechanisms to approximate the posterior distribution of topics, to the best of our knowledge, none of them has so far used adversarial learning to lead the generation of topics based on their sentiment polarity and they do not provide any mechanism for smooth transitions between topics, as introduced in the presented Brand-Topic Model.

Brand-Topic Model (BTM)
We propose a probabilistic model for monitoring the assessment of various brands in the beauty market from Amazon reviews. We extend the Text-Based Ideal Point (TBIP) model with adversarial learning and Gumbel-Softmax to construct document features for sentiment classification. The overall architecture of our proposed BTM is shown in Figure 2. In what follows, we will first give a brief introduction of TBIP, followed by the presentation of our proposed BTM.

Background: Text-Based Ideal Point (TBIP) model
TBIP (Vafa et al., 2020) is a probabilistic model which aims to quantify political positions (i.e. ideal points) from politicians' speeches and tweets via Poisson factorisation. In its generative processes, political text is generated from the interactions of several latent variables: the per-document topic intensity θ dk for K topics and D documents, the V -vectors representing the topics β kv with vocabulary size |V |, the author's ideal point s expressed with a real-valued scalar x s and the ideological topic expressed by a real-valued V -vector η k . In particular, the ideological topic η k aligns the neutral topic (e.g. gun, abortion, etc.) according to the author's ideal point (e.g. liberal, neutral, conservative), thus modifying the prominent words in the original topic (e.g. 'gun violence', or 'constitutional rights'). The observed variables are the author a d for a document d, and the word count for a term v in d encoded as c dv . The TBIP model places a Gamma prior on β and θ, which is the assumption inherited from the Poisson factorisation, with m, n being hyperparameters.
It places instead a normal prior over the ideological topic η and ideal point x: The word count for a term v in d, c dv , can be modelled with Poisson distribution:

Brand-Topic Model (BTM)
Inspired by the TBIP model, we introduce the Brand-Topic Model by reinterpreting the ideal point x s as brand-polarity score x b expressing an ideal feeling derived from reviews related to a brand, and the ideological topics η kv as opinionated topics, i.e. polarised topics about brand qualities. Thus, a term count c dv for a product's reviews derives from the hidden variable interactions as c dv ∼ P ois(λ dv ) where: with the priors over β, θ, η and x initialised according to the TBIP model. The intuition is that if a word tends to frequently occur in reviews with positive polarities, but the brand-polarity score for the current brand is negative, then the occurrence count of such a word would be reduced since x b d and η kv have opposite signs.

Distant Supervision and Adversarial Learning
Product reviews might contain opinions about products and more general users' experiences (e.g. delivery service), which are not strictly related to the product itself and could mislead the inference of a reliable brand-polarity score. Therefore, to generate topics which are mainly characterised by product opinions, we provide an additional distant supervision signal via their review ratings. To this aim, we use a sentiment classifier, a simple linear layer, over the generated document representations to infer topics that are discriminative of the review's rating.
In addition, to deal with the imbalanced distribution in the reviews, we design an adversarial mechanism linking the brand-polarity score to the topics as shown in Figure 3. We contrastively sample adversarial training instances by reversing the original brand-polarity score (x b ∈ [−1, 1]) and generating associated representations. This representation will be fed into the shared sentiment classifier with the original representation to maximise their distance in the latent feature space.
Gumbel-Softmax for Word Sampling As discussed earlier, in order to construct document features for sentiment classification, we need to sample word counts from the Poisson distribution. However, directly sampling word counts from the Poisson distribution is not differentiable. In order to enable back-propagation of gradients, we apply Gumbel-Softmax (Jang et al., 2017;Joo et al., 2020), which is a gradient estimator with the reparameterization trick.
We can then draw samples z dv from the categorical distribution with class probabilities π = (π 0 , π 1 , · · · , π n−1 ) using: where τ is a constant referred to as the temperature, c is the outcome vector. By using the average of weighted word account, the process is now differentiable and we use the sampled word counts to form the document representation and feed it as an input to the sentiment classifier.
Objective Function Our final objective function consists of three parts, including the Poisson factorisation model, the sentiment classification loss, and the reversed sentiment classification loss (for adversarial learning). For the Poisson factorisation modelling part, mean-field variational inference is used to approximate posterior distribution (Jordan et al., 1999;Wainwright and Jordan, 2008;Blei et al., 2017).
For optimisation, to minimise the approximation of q φ (θ, β, η, x) and the posterior, equivalently we maximise the evidence lower bound (ELBO): The Poisson factorization model is pre-trained by applying the algorithm in Gan et al. (2015), which is then used to initialise the varational parameters of θ d and β k . Our final objective function is: where L s and L a are the cross entropy loss of sentiment classification for sampled documents and reversed sampled documents, respectively, and λ is the weight to balance the two parts of loss, which is set to be 100 in our experiments.

Experimental Setup
Datasets We construct our dataset by retrieving reviews in the Beauty category from the Amazon review corpus 1 (He and McAuley, 2016). Each review is accompanied with the rating score (between 1 and 5), reviewer name and the product meta-data such as product ID, description, brand and image. We use the product meta-data to relate a product with its associated brand. By only selecting brands with relatively more and balanced reviews, our final dataset contains a total of 78,322 reviews from 45 brands. Reviews with the rating score of 1 and 2 are grouped as negative reviews; those with the score of 3 are neutral reviews; and the remaining are positive reviews. The statistics of our dataset is shown in Table 1 2 . We can observe that our data is highly imbalanced, with the positive reviews far more than negative and neutral reviews.
Baselines We compare the performance of our model with the following baselines: • Joint Sentiment-Topic (JST) model (Lin and He, 2009), built on LDA, can extract polaritybearing topics from text provided that it is supplied with the word prior sentiment knowledge. In our experiments, the MPQA subjectivity lexicon 3 is used to derive the word prior sentiment information.
• SCHOLAR (Card et al., 2018), a neural topic model built on VAE. It allows the incorporation of meta-information such as document class labels into the model for training, essentially turning it into a supervised topic model. Parameter setting Since documents are represented as the bag-of-words which result in the loss of word ordering or structural linguistics information, frequent bigrams and trigrams such as 'without doubt', 'stopped working', are also used as features for document representation construction. Tokens, i.e., n-grams (n = {1, 2, 3}), occurred less than twice are filtered. In our experiments, we set aside 10% reviews (7,826 reviews) as the test set and the remaining (70,436 reviews) as the training set. For hyperparameters, we set the batch size to 1,024, the maximum training steps to 50,000, the topic number to 30, the temperature in the Gumbel-Softmax equation in Section 3.2 to 1. Since our dataset is highly imbalanced, we balance data in each mini-batch by oversampling. For a fair comparison, we report two sets of results from the baseline models, one trained from the original data, the other trained from the balanced training data by oversampling negative reviews. The latter results in an increased training set consisting of 113,730 reviews.

Experimental Results
In this section, we will present the experimental results in comparison with the baseline models in brand ranking, topic coherence and uniqueness measures, and also present the qualitative evaluation of the topic extraction results. We will further discuss the limitations of our model and outline future directions.  Brand Ranking We report in Table 2 the brand ranking results generated by various models on the test set. The two commonly used evaluation metrics for ranking tasks, Spearman's correlations and Kendall's Tau, are used here. They penalise inversions equally across the ranked list. Both TBIP and BTM can infer each brand's associated polarity score automatically which can be used for ranking. For both JST and SCHOLAR, we derive the polarity score of a brand by aggregating the sentiment probabilities of its associated review documents and then normalising over the total number of brandrelated reviews. It can be observed from Table 2 that JST outperforms both SCHOLAR and TBIP. Balancing the distributions of sentiment classes improves the performance of JST and SCHOLAR. Overall, BTM gives the best results, showing the effectiveness of adversarial learning.

Topic Coherence and Uniqueness
Here we choose the top 10 words for each topics to calculate the context-vector-based topic coherence scores (Röder et al., 2015). In the topics generated by TBIP and BTM, we can vary the topic polarity scores to generate positive, negative and neutral subtopics as shown in Table 4. We would like to achieve high topic coherence, but at the same time maintain a good level of topic uniqueness across the sentiment subtopics since they express different polarities. Therefore, we additionally consider the topic uniqueness (Nan et al., 2019) to measure word redundancy among sentiment subtopics, 1 cnt(l,k) , where cnt(l, k) denotes the number of times word l appear across positive, neutral and negative topics under the same topic number k. We can see from Table 3 that both TBIP and BTM achieve higher coherence scores compared to JST and SCHOLAR. TBIP slightly outperforms BTM on topic coherence, but has a lower topic uniqueness score. As will be shown in Table 4, topics extracted by TBIP contain words significantly overlapped with each other among sentiment subtopics. SCHOLAR gives the highest topic uniqueness score. However, it cannot separate topics with different polarities. Overall, our proposed BTM achieves the best balance between topic coherence and topic uniqueness.

Model
Topic Coherence  Table 3: Topic coherence/uniqueness measures of results generated by various models.

Example Topics Extracted from Amazon Reviews
We illustrate some representative topics generated by TBIP and BTM in Table 4. It is worth noting that we can generate a smooth transition of topics by varying the topic polarity score gradually as shown in Figure 1. Due to space limit, we only show topics when the topic polarity score takes the value of −1 (negative), 0 (neutral) and 1 (positive). It can be observed that TBIP fails to separate subtopics bearing different sentiments. For example, all the subtopics under 'Duration' express a positive polarity. On the contrary, BTM shows a better-separated sentiment subtopics. For 'Duration', we see positive words such as 'comfortable' under the positive subtopic, and words such as 'stopped working' clearly expressing negative sentiment under the negative subtopic. Moreover, top words under different sentiment subtopics largely overlapped with each other for TBIP. But we observe a more varied vocabulary in the sentiment subtopics for BTM.
TBIP was originally proposed to deal with political speeches in which speakers holding different ideal points tend to use different words to express their stance on the same topic. This is however not the case in Amazon reviews where the same word could appear in both positive and negative reviews. For example, 'cheap' for lower-priced products could convey a positive polarity to express value for money, but it could also bear a negative polarity implying a poor quality. As such, it is difficult for TBIP to separate words under different polarity-bearing topics. On the contrary, with the incorporation of adversarial learning, our proposed BTM is able to extract different set of words cooccurred with 'cheap' under topics with different polarities, thus accurately capturing the contextual polarity of the word 'cheap'. For example, 'cheap' appears in both positive and negative subtopics for 'Brush' in Table 4. But we can find other cooccurred words such as 'pretty' and 'soft' under the positive subtopic, and 'plastic' and 'flimsy' under the negative subtopic, which help to infer the contextual polarity of 'cheap'.
TBIP also appears to have a difficulty in dealing with highly imbalanced data. In our constructed dataset, positive reviews significantly outnumber both negative and neutral ones. In many sentiment subtopics extracted by TBIP, all of them convey a positive polarity. One example is the 'Duration' topic under TBIP, where words such as 'great', 'great price' appear in all positive, negative and neutral topics. With the incorporation of supervised signals such as the document-level sentiment labels, our proposed BTM is able to derive better separated polarised topics.
As an example shown in Figure 1, if we vary the polarity score of a topic from −1 to 1, we observe a smooth transition of its associated topic words, gradually moving from negative to positive. Under the topic (shaver) shown in this figure, four brand names appeared: REMINGTON, NORELCO, BRAUN and LECTRIC SHAVE. The first three brands can be found in our dataset. REM-INGTON appears in the negative side and it indeed has the lowest review score among these 3 brands; NORELCO appears most and it is indeed a popular  brand with mixed reviews; and BRAUN gets the highest score in these 3 brands, which is also consistent with the observations in our data. Another interesting finding is the brand LECTRIC SHAVE, which is not one of the brands we have in the dataset. But we could predict from the results that it is a product with relatively good reviews.

Limitations and Future work
Our model requires the use of a vanilla Poisson factorisation model to initialise the topic distributions before applying the adversarial learning mechanism of BTM to perform a further split of topics based on varying polarities. Essentially topics generated by a vanilla Poisson factorisation model can be considered as parent topics, while polarity-bearing subtopics generated by BTM can be considered as child topics. Ideally, we would like the parent topics to be either neutral or carrying a mixed sentiment which would facilitate the learning of polarised sub-topics better. In cases when parent topics carry either strongly positive or strongly negative sentiment signals, BTM would fail to produce polarity-varying subtopics. One possible way is to employ earlier filtering of topics with strong polarities. For example, topic labeling (Bhatia et al., 2016) could be employed to obtain a rough estimate of initial topic polarities; these labels would be in turn used for filtering out topics carrying strong sentiment polarities. Although the adversarial mechanism tends to be robust with respect to class imbalance, the disproportion of available reviews with different polarities could hinder the model performance. One promising approach suitable for the BTM adversarial mechanism would consist in decoupling the representation learning and the classification, as suggested in Kang et al. (2020), preserving the original data distribution used by the model to estimate the brand score.

Conclusion
In this paper, we presented the Brand-Topic Model, a probabilistic model which is able to generate polarity-bearing topics of commercial brands. Compared to other topic models, BMT infers realvalued brand-associated sentiment scores and extracts fine-grained sentiment-topics which vary smoothly in a continuous range of polarity scores. It builds on the Poisson factorisation model, combining it with an adversarial learning mechanism to induce better-separated polarity-bearing topics. Experimental evaluation on Amazon reviews against several baselines shows an overall improvement of topic quality in terms of coherence, uniqueness and separation of polarised topics.