AX-MABSA: A Framework for Extremely Weakly Supervised Multi-label Aspect Based Sentiment Analysis

Aspect Based Sentiment Analysis is a dominant research area with potential applications in social media analytics, business, finance, and health. Prior works in this area are primarily based on supervised methods, with a few techniques using weak supervision limited to predicting a single aspect category per review sentence. In this paper, we present an extremely weakly supervised multi-label Aspect Category Sentiment Analysis framework which does not use any labelled data. We only rely on a single word per class as an initial indicative information. We further propose an automatic word selection technique to choose these seed categories and sentiment words. We explore unsupervised language model post-training to improve the overall performance, and propose a multi-label generator model to generate multiple aspect category-sentiment pairs per review sentence. Experiments conducted on four benchmark datasets showcase our method to outperform other weakly supervised baselines by a significant margin.


Introduction
Aspect-based sentiment analysis (ABSA) is a wellknown sentiment analysis task which provides more fine-grained information than simple sentiment understanding (Liu, 2012).The main goal of ABSA is to find the aspects and its associated sentiment within a given text.While the works on ABSA have expanded in different directions, it has primarily two sub-tasks, Aspect Term Sentiment Analysis (ATSA) and Aspect Category Sentiment Analysis (ACSA) (Xue and Li, 2018).ATSA consists of different tasks like aspect term extraction (Li et al., 2018;Luo et al., 2019;Li et al., 2020a;Shi et al., 2021), aspect term sentiment classification (He et al., 2018;Chen and Qian, 2019;Hou et al., 2021), opinion term extraction (Dai and Song,   1 Code, data and model are available at https://github.com/sabyasachi-kamila/AX-MABSA 2019; He et al., 2019;Chen and Qian, 2020b), aspect-oriented opinion term extraction (Fan et al., 2019;Wu et al., 2020a), aspect-opinion pair extraction (Zhao et al., 2020), etc.For example, in the sentence "The sushi is top-notch, the waiter is attentive, but the atmosphere is dull.",ATSA would extract the aspect terms 'sushi', 'waiter' and 'atmosphere'; opinion terms 'top-notch', 'attentive', and 'dull'; and their associated sentiments 'positive', 'positive' and 'negative'.The other sub-task ACSA aims to find the higher order aspect categories and its associated sentiment from a given text.In the above example, ACSA would detect the categories as 'food' (as 'pasta' is a type of 'food'), 'service' and 'ambience'; and the associated sentiments as 'positive', 'positive' and 'negative'.
Existing research on ABSA is dominated by supervised methods, where labeled training data is provided (Chen et al., 2017;Xue and Li, 2018;Cai et al., 2021;Liu et al., 2021;Xu et al., 2021;Yan et al., 2021).A few works try to solve the problem in a weakly/semi-supervised manner, where a few labelled samples are provided (Wang et al., 2021a).However, there has been a lack of study on ABSA using unsupervised methods, i.e., without using any labelled data.A few works also focused on unsupervised aspect term extraction (Shi et al., 2021).However, such works do not deal with the sentiment associated with the aspects.An existing work on weakly supervised ACSA (Huang et al., 2020) only considered a single aspect category per sentence -thus limiting the task to a larger extent.
Motivated by the above, in this work, we present a methodology for extremely weakly supervised ACSA task, where we do not need any labelled training samples.We solve both aspect category detection (ACD) and ACSA tasks (on each review sentence) just by using the surface text of aspect category and sentiment.Given N review sentences, C categories of interest and P polarities of interest, the ACD task generates C clusters, while the ACSA task generates (c i , p j ) tuples where c i ∈ C, and p j ∈ P. As in (Wang et al., 2021b), we adopt the representation learning perspective, wherein representing sentences by class names leads to better clustering.We only use the surface text of the class names and unlabelled sentences to get aspect category and sentiment clusters.
However, in clustering, each review sentence would get only one label, thus limiting the task by a substantial extent.To tackle this, we propose X-MABSA, a multi-label generator model which makes use of dependency parser (Qi et al., 2020) and a similarity-based attention mechanism to generate multiple categories and associated sentiment polarity labels for each review sentence.In addition, we find that sometimes the representative text of aspect categories (provided as input) is not present (or sparse) in the text corpus.This might lead to skewed representation of the classes in our framework and thus degrade performance.Therefore, we present an automatic surface word selection strategy which would represent the class names better.We combine this with our X-MABSA model and denote it as AX-MABSA.
We also showcase that unsupervised posttraining of language model on domain specific data significantly improves the sentence representation and thus achieves better results for ACSA tasks.For this, we post-train BERT language model (Devlin et al., 2019) using domain specific unlabelled data.We perform experiments on four different benchmark aspect-based datasets (Pontiki et al., 2014(Pontiki et al., , 2015(Pontiki et al., , 2016;;Cheng et al., 2017), and compare with different supervised and weakly supervised baselines.Our main contributions are as follows: • an extremely weakly supervised method to solve the ACSA task without relying on any labelled data, and using only the class names as the only provided information; • an automatic surface word selection strategy for choosing a suitable word corresponding to each aspect and sentiment class; • use of BERT language model post-training on domain specific unlabelled data for semantic representation of review sentences; • a multi-label generator model which makes use of a dependency parser and a similaritybased attention mechanism for generating multiple aspect-sentiment labels for each sentence; and • experimental results comparing our architecture with different existing baselines on four benchmark aspect datasets.

Related Work
Aspect Based Sentiment Analysis (ABSA) has gained significant attention for a long time, and research has been done in primarily two directions -Aspect Term Sentiment Analysis (ATSA) and Aspect Category Sentiment Analysis (ACSA).

Aspect Term Sentiment Analysis
Research on ATSA has been in different subcategories like, Aspect Term Extraction In this sub-task, aspect terms associated with a category are extracted from a given text.Prior research on this is based on sequence labelling problem (Ma et al., 2019;Li et al., 2020a).Li and Lam (2017) (Lewis et al., 2020) -based model to solve all ATSA tasks.Cai et al. (2021) introduced a new task called, aspect-category-opinion-sentiment quadruple extraction, a BERT (Devlin et al., 2019)-based model to deal with implicit aspects and opinion terms.Xu et al. (2021) proposed a new span-level method for the aspect-sentiment-opinion triplet extraction.

Aspect Category Sentiment Analysis
Aspect Category Sentiment Analysis (ACSA) finds aspect categories and their associated sentiments from a text.Research on this has been conducted on both Aspect Category Detection (ACD) and ACSA tasks.Ma et al. (2018) proposed a word attention-based hierarchical model which takes common-sense knowledge for solving ACSA task.Xue and Li (2018) presented a novel CNN (LeCun et al., 1995)-based model for ACSA task.Liang et al. (2019) proposed an encoding scheme which was aspect-guided and able to perform aspectreconstruction. Sun et al. (2019) constructed an auxiliary text for aspects and reformed the ACSA as a classification task.Wang et al. (2020) proposed a novel dependency tree-based model and a relational graph attention network for encoding the sentences.Li et al. (2020b) designed a multi-instance framework for multi-label ACSA task.Cai et al. (2020) reformed the task as sentiment-category with a two-layer hierarchy where the higher layer detected the sentiment while the lower layer detected the aspect category.Liang et al. (2021) presented a semisupervised framework having a beta distributionbased model.The model finds semantically related words from the context of a target aspect.Liu et al. (2021) solved the ACSA task as a text generative method using BART (Lewis et al., 2020).Zhang et al. (2021) presented aspect sentiment quad prediction task where ACSA was formulated as a paraphrase generation task.
Almost all existing works on ACSA are based on supervised methods.In contrast, this work proposes a method for ACSA which does not require any labelled data and relies only on seed text for aspect class names.

Proposed Methodology
Our proposed method, AX-MABSA, works on the following components: (a) class name-based clustering, (b) unsupervised language model posttraining on domain-specific data for better contextual representation of review sentences, (c) a multi-label generator model to generate multiple categories and associated sentiment labels, and (d) automatic class-representative text selection.The overall framework is depicted in Figure 1.
Problem Formulation: We formulate the extremely weakly supervised ACD and ACSA tasks as: Consider as input a review sentence x = {x 1 , x 2 , x 3 , ......, x n } where x i is the i th word of the sentence and n is the length of the sentence, along with a list of C predefined aspect categories.The output for the ACD task is c categories for a sentence where c ⊂ C. For the ACSA task, the output is a list of tuples (c j , p k ) where c j is the j th predicted category and p k is the k th predicted sentiment polarity corresponding to the category c j .The sentiment polarity p is from the set s ⊂ {positive, negative}.

ACSA Module
As a primary task, we address the aspect detection based on the seed aspect categories provided as input.We adopt the X-Class model as presented in (Wang et al., 2021b)   For word representations, at first, a vocabulary is created from all the input texts.Then, each word's contextual representation is represented using a pretrained BERT language model (Devlin et al., 2019).The contextual embeddings of each word are averaged to obtain the review sentence encoding, and this representation is denoted as the static word representation s r .
Here, z i,j is the contextualized representation and R i,j is the j th word in the review sentence R i .
As class representation, the representations of the aspect class names are constructed based on the static representations of those words.For example, the category "sports" is represented by the contextual embedding of the word "sports".Then an expansion technique is used to find similar words of each class name words from within the input texts and average those words' contextual representations to obtain the final aspect class embedding.
In class-specific document representation, the representations of the documents or the sentences are guided by the class representations so that the sentences become more aligned to the topics of interest, i. Clustering Algorithms: We followed different centroid-based clustering algorithms such as K-Means (Lloyd, 1982), Mini-batch K-Means (Sculley, 2010) and Gaussian Mixture Model (GMM) (Duda et al., 1973); and found that in-general Minibatch K-Means (mk-means) performs best for the ACD task while GMM performs best for the ACSA task.So, we fix this for our experiments.We used Principal component analysis (Abdi and Williams, 2010) for dimensionality reduction of sentence representation and class representation vectors before clustering.The target dimension is set to 64.We also fixed random_state to 42 for centroid initialization.For mk-means, we used batch size 400.
The model requires the surface text of the class names to be present on the dataset for a certain number of times.We feel this is a potential drawback in solving our ACSA task, as some surface text of category names may not be present in the dataset or have a proper meaning representation.For example, the category word "miscellaneous" might have no clear meaning and sometimes might not be present in the dataset.To resolve this issue, we explicitly add the category name to the vocabulary set if it is not found on the dataset.Another drawback of the above approach is that it can only predict one label per sentence.This is a huge limitation, especially when multiple aspect categories are present in a review sentence.In the following sections, we tackle these issues to propose a robust multi-aspect extraction framework.

AX-SABSA
We observed that the performance of the implemented ACSA module based on X-Class is poor.One of the reasons is that the words' representations are based on the pre-trained BERT language model (Devlin et al., 2019) which gives more general representations of each word, which works well for topic modelling tasks.However, the aspect terms are more specific to the domains and thus general representation does not provide specific information.Therefore, we suggest that unsupervised post-training of BERT on domain-specific data would lead to better word-representations and thus a better performance.

Unsupervised Post-training of BERT Language Model (UPBERT):
We follow a recent model (Gao et al., 2021) which feeds the same input twice, one with dropout masks and the other with different dropout masks, to the encoder.The model optimizes the following objective function: Here, a is hidden state, which is a function of the input sentence and dropout masks p and p ′ .We feed our collected domain specific unlabeled data of varied sizes to this representation and finetune the BERT model.For our experiments, we use batch size as 128, sequence length as 32, learning rate as 3e-5, loss function as Multiple Negatives Ranking Loss of the sentence-transformer model (Reimers and Gurevych, 2019).We vary the dataset size starting from 10k samples for training, and get different fine-tuned BERT models corresponding to different data sizes.Finally, we select the fine-tuned model which provides the best performance (using 80k unlabeled training sentences in our case) and apply this UPBERT model for word representation to solve the ACSA task and call our single-label predictor model X-SABSA.on the data.Although we add these words to the vocabulary explicitly, their contextual representations become poor.As an immediate solution, we can manually select candidate words corresponding to class names.However, this would be a difficult and tedious job when the number of categories would be high.Also, there can be multiple candidate words for a class name.For example, to represent the category "ambience", one can choose any of the following words: atmosphere, environment, vibes, etc.Similarly, to represent a negative polarity, one can choose any of the following words: bad, problem, pathetic, poor, etc. Depending upon the words we choose, the overall performances vary significantly.So, we propose an algorithm that selects these candidate words automatically given the original class names (see Algorithm 1).
The algorithm ACSSA takes a particular part-ofspeech tag (noun for the ACD task, adjective for the ACSA task), dataset, vocabulary and the class names as input and produces a candidate word list as output.Initially, it creates a list uV of words from the vocabulary which has a desired part-of-speech tag.It then finds all the similar words for each class name from the list uV.We then select top-T values from each list.This can be varied depending upon inspection.We fixed it to 10 based on experimental results.Then the similar words for each class are sorted according to the cosine similarity scores.Finally, we sort each list according to their number of occurrences in the dataset.We then select the topmost occurring word from each list as the aspect class representative.This would produce a single candidate word for each class.Thus, the AX-SABSA module uses ACSSA in combination with X-SABSA, to automatically generate better aspect category names.

AX-MABSA
Since clustering produces only one label for each review sentence, we propose a Multi-label Generator model based on dependency parser (Qi et al., 2020) and a similarity-based attention mechanism.
Multi-label Generator Model: This model takes the unlabelled sentences, the sentence representations, the category class representations, and the clustering outputs to generate multiple categories and associated sentiment polarity for each sentence.We illustrate the model using the following example: "The food was good, but it's not worth the wait or the lousy service".The sentence has tags '(food, positive)' and '(service, negative)'.
Parsing the Input Sentences The unlabelled input sentences are parsed by off-the-shelf dependency parser (Qi et al., 2020).The parser outputs a pair of dependencies (word, word[head-1]).The output of the above sentence can be seen in Figure 2.For each word with Noun part-of-speech tag in the sentence, we select those pairs where either the word or word[head-1] is also a Noun.We call these final set of pairs as 'PPairs'.The 'PPairs' for the above sentence are ('food', 'good'), ('wait', 'worth'), and ('service', 'wait').Observe, in general, the first word in a pair is related to aspects while the second word is associated with sentiment.
Similarity-based Attention We use a similaritybased attention mechanism to assign a desired class label to each of the words in the PPairs.We first obtain the similarity values between the words in the sentence and all the class names using the cosine similarity as: S i,j = cos(w i , c j ).Now, we calculate max c (S) which assigns each word to the highest similar class.For each aspect word in the 'PPairs' if the corresponding max c (S) is greater than a threshold2 then we keep those 'PPairs'.We call these filtered pairs as 'FP-Pairs'.Finally, we assign the aspect, sentiment label to each 'FPPairs' based on its corresponding max c (S) values.If the 'FPPairs' has only one or empty pair, then we consider the clustering outputs as the predicted aspect and sentiment pair.
In the entire setup, we use the UPBERT model mentioned in Section 3.2 for word representation.We refer to the entire model as X-MABSA.When we use the automatic surface word selection algorithm ACSSA in the X-MABSA model, we call that final model AX-MABSA.

Experimental Setup
We discuss here the datasets we have used, word representations, and different baselines we have selected for our experiments.3

Word Representations
We consider a pre-trained language model called BERT (Devlin et al., 2019) (in particular we chose the 'bert-base-uncased' model which has 110M parameters).BERT follows a transformer model (Vaswani et al., 2017) for its representation, where the model predicts the masked words using the surrounding context words.We obtain vector representation for each word of a given sentence using BERT language model.We use BERT for both word representations and the post-training tasks.

Baselines
We compare the performance of the proposed model with diverse types of baselines such as random, supervised, and weakly supervised methods.
• Random: At first, we present a random baseline where the predictions are generated using a uniform distribution.This will provide us with a lower bound for our evaluation.
• Supervised: A recent supervised method, ACSA-generation (Liu et al., 2021) solves the ACSA as a generation task.The training and test set are structured with some predefined templates.Finally, the authors used BART (Lewis et al., 2020), a denoising autoencoder, for generating the desired outputs.This will give us an approximate upper bound for our evaluation.
• Weakly Supervised: A weakly supervised method, JASen (Huang et al., 2020) takes unlabelled training reviews and a few keywords corresponding to each aspect categories and sentiment polarity and outputs an (aspect, sentiment) pair for each review.The authors only considered the sentences with single aspect category.
• Extremely Weakly Supervised: The method X-Class (Wang et al., 2021b) takes reviews and a single keyword per class name as inputs and predicts a single class for each review.
The method was validated majorly on different topic modelling datasets.

Experimental Evaluation
In this section, we study the performance of the different algorithms on four datasets, compare them with different baselines, and discuss the qualitative analysis of our model performance.

Evaluation Framework
We evaluate our method in an End-to-End framework.The popularly used ABSA evaluation uses gold aspects as a part of input to predict the sentiment polarity of each gold aspect.However, when the task is unsupervised (almost), we do not expect to know the aspect categories beforehand, as has been explored in previous works involving sentiment mining alone.Thus, we follow the End-to-End framework, which has two stages.In the first stage, given sentences, all the aspects are predicted.
In the second stage, for each predicted aspect in the first stage, the corresponding sentiment polarity is predicted.Therefore, in our case, the first-stage output is the ACD output, which outputs aspect categories corresponding to each sentence.The second stage output is the ACSA output, which is a set of tuples consisting of (aspect category, sentiment polarity) pairs for each sentence.Therefore, if both the aspect category and sentiment polarity are predicted correctly then only, we consider it as a correct prediction.Thus, the performance is measured over all tuples (aspect, sentiment) in the gold data.

Evaluation Metrics
We consider two metrics for performance evaluation.For the ACD task, we report macro-averaged F1 score (or F1-macro) which is the average of F1-scores per class.For the ACSA task, we report macro-averaged F1-PN score (or macro F1-PN) which is the mean of F1-scores of all aspect category, sentiment (positive, negative) pair tuples.

Empirical Results
Comparative results of the ACD and ACSA tasks on different datasets are presented in Table 2.The results show that we achieve far better performance than random baselines given that our approach is unsupervised.The improvement of our multi-label models (X-MABSA and AX-MABSA) is statistically significant at p < 0.01 using paired t-test (Hsu and Lachenbruch, 2014) compared to proposed single label models (X-SABSA and AX-SABSA) and weakly supervised baselines (X-Class, and JASen).
For the ACD task, we achieve baseline results for all the datasets (ACSA module).We obtain F1macro of 46.69,40.35,36.58,and MAMS dataset,respectively.The proposed X-SABSA model improves the performance significantly on all the datasets (F1-macro of 56.16,58.87,42.77,and MAMS data,respectively).Within our proposed models, we find that our multi-label model X-MABSA performs better than single-label model X-SABSA on all datasets.Especially, on the MAMS dataset, it improves the performance significantly (F1-macro of 56.48).We also observe that the AX-MABSA model (i.e., when automatically selected candidate words are considered for class representation) further improves performance on 50.63,and 60.82).It shows that the AX-MABSA model is more generalized and works very well when class names are not present in the input data.
As the ACSA task is framed as an End-to-End pipeline, we expect the performance to be lower than the often-used ACSA evaluation procedure.We achieve the baseline results (ACSA module) which are F1-PN-macro scores of 34.44,25.49,24.83,and MAMS,respectively.We find that the proposed X-SABSA model improves the performance significantly over the baseline (F1-PN-macro of 39.66,42.55,respectively).The multi-label model, X-MABSA improves the results further (F1-PNmacro of 44.96, 44.35, 35.81, and 27.28, respectively).We also observe that the AX-MABSA model improves the performance on Rest-14, 36.47,and 29.74,respectively).We observe that our proposed model performs significantly better than the random, and two weakly supervised baselines (X-Class and JASen) on both ACD and ACSA tasks.As our method is an extremely weakly supervised method, we do not expect our model to be better than the supervised model.However, in comparison to the supervised model (ACSA-generation), our method shows promising performance.For example, on the Rest-14 data, the supervised model achieves an F1-macro of 91.42 while our proposed model achieves an F1-macro of 74.90 for the ACD task.For the ACSA task, the proposed method performs decently compared to the supervised baseline.For example, on Rest-15, the supervised method achieves an F1-PN-macro of 71.91 while our method achieves an F1-PN-macro of 44.35.
It is evident that our proposed method works comparatively poorly for ACSA task on the MAMS data.The reason for this is the presence of a remarkably high number of 'neutral' classes (43.62% of total polarity labels).Selecting a single repre-

Actual
Predicted The sashimi is always fresh and the rolls are innovative and delicious.
(food, positive) (food, positive) While there's a decent menu, it shouldn't take ten minutes to get your drinks and 45 for a dessert pizza.
(food, positive), (service, negative) (food, positive), (service, positive) Who can't decide on a single dish, the tapas menu allowed me to express my true culinary self.
(food, negative), (menu, positive) (menu, negative) Roof: very nice space (although I know 5 other rooftop bars just as good), but the crowd was bunch of posers and the owner was a tool.
(place, positive), (miscellaneous, neutral) (place, positive), (ambience, negative) Endless fun, awesome music, great staff!(service, positive), (ambience, positive), (restaurant, positive) (service, positive), (ambience, positive) Table 3: Illustration of the proposed method using few examples sentative surface word for 'neutral' class is difficult as there is no association between any word and neutral sentences as compared to the 'positive' and 'negative' class.For example, the word 'bad' can be a representative of 'negative' class and the word 'good' can be the same of 'positive' class, but we found no such representative word for neutral class to perform well.

Performance Analysis
We report a few example texts with original and our model predicted tags in Table 3.We find that in some cases our model combines two closely related categories to one.For example, the text "who can't decide on a single dish, the tapas menu allowed me to express my true culinary self." has gold category as food and menu.Our model predicts it as menu.The reason is that both the words 'dish', and 'menu' got higher similarity score to category 'menu' which is reasonable.The fourth sentence in the Table 3 has gold labels as '(place, positive)' and '(miscellaneous, neutral)'.Our model predicts as '(place, positive)' and '(ambience, negative)'.We see here that the 'miscellaneous' class has been miss-classified into 'ambience' and 'neutral' to 'negative'.The 'miscellaneous' class is difficult to represent, even if we replace this word by the automatic surface word selection algorithm.Also, from the sentence, we can sense that the 'ambience' can be a class with 'negative' polarity.
The fifth sentence in Table 3 has gold labels as 'service', 'ambience' and 'restaurant'.However, our model predicts it as 'service', and 'ambience' missing the 'restaurant' category.This happened as in the sentence, there is no explicit presence of restaurant related words.Another point is that there are some mutual words related to both 'ambience' and 'restaurant', such as the word 'place' can be related to both 'ambience' and 'restaurant'.

Conclusion
In this paper, we studied extremely weakly supervised aspect category sentiment analysis across four benchmark datasets, and presented the stateof-the-art unsupervised framework without the requirement of any labelled data.Our method relied only on the surface text of aspect class names and unlabelled texts to extract aspect-sentiment pairs via a multi-label generator model.We proposed an automatic class-representative surface word selection algorithm to select proper representative words corresponding to each class.We also found that unsupervised post-training of language models on domain-specific data improved the wordrepresentations and thus improved the performance.Experiments show that our proposed method performs better than all weakly supervised baseline models.In the future, we intend to improve our methods to incorporate more sentiment classes.We believe that our work would foster more research interest towards unsupervised ABSA.

Limitations
The main limitation of the proposed work is that it is unable to model the "neutral" sentiment class, and performs significantly lower when the number of neutral sentiment reviews are high in a dataset.This is evident from ACSA results on the MAMS data (in Table 2), where the number of neutral classes is high.We have also tried with some possible neutral class related seed category words like 'okay', 'moderate', 'average', etc. but the performance did not improve.It shows that these words can not represent the 'neutral' class.Thus, modelling the 'neutral' class efficiently will improve the model performance.Although our model performs better than other weakly supervised baselines, there is enough scope for improvement to bridge the gap between the supervised methodologies.
for solving extremely weakly supervised classification tasks majorly on topic modelling datasets.This module involves four stages: (a) word representations, (b) class representation, (c) class-specific document representation, and (d) document-class alignment.

Figure 1 :
Figure 1: Overview of the proposed AX-MABSA Framework e., the class names.Different attention mechanisms are used over the document representations guided by class representations to get updated document representations.Finally, for document-class alignment, clustering algorithms are used to cluster n-documents to c-clusters (c is the number of classes), wherein the seed class centroids are initialized with the class representations.

Figure 2 :
Figure 2: Sample Dependency Parser Output Automatic Class-representative Surface Text Selection Algorithm (ACSSA): Our model suffers when the surface text of class names is not present Algorithm 1 Algorithm for Automatic Classrepresentative Surface Text Selection Input: X (noun for ACD, adjective for ACSA), dataset D, vocabulary V, class names C Output: A List selected containing Candidate words for each class Initialize a global array uV[], T, targetL[], sourceL[], interL[],

Table 1 :
Gold Data Statistics.Imbalance value signifies the ratio between the largest and smallest category size.
datasets for sentence-level aspect category and aspect category sentiment.The Rest-14 data has five categories as food, service, ambience, price, and miscellaneous.Rest-15 and Rest-16 have restaurant, ambience, food, service, and drinks categories.MAMS dataset has food, ambience, price, service, miscellaneous, staff, menu, and place categories.The test data size for all the dataset is reported in Table1.Imbalance signifies the ratio between the largest class size and smallest class size.

Table 2 :
Comparative Results for the ACD and End-to-End ACSA tasks.We report F1-macro score for ACD and F1-PN macro score for ACSA.X-SABSA: Proposed single label predictor model.AX-SABSA: Proposed single label predictor model where the candidate word for each class is also updated.X-MABSA: Proposed multi-label predictor model.AX-MABSA: Proposed multi-label predictor model where the candidate word for each class is also updated.Clustering algorithm used: mini batch k-means for ACD, and gmm for ACSA.