Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis

Both the issues of data deficiencies and semantic consistency are important for data augmentation. Most of previous methods address the first issue, but ignore the second one. In the cases of aspect-based sentiment analysis, violation of the above issues may change the aspect and sentiment polarity. In this paper, we propose a semantics-preservation data augmentation approach by considering the importance of each word in a textual sequence according to the related aspects and sentiments. We then substitute the unimportant tokens with two replacement strategies without altering the aspect-level polarity. Our approach is evaluated on several publicly available sentiment analysis datasets and the real-world stock price/risk movement prediction scenarios. Experimental results show that our methodology achieves better performances in all datasets.


Introduction
Data annotation, the first step of most artificial intelligence researches, often takes lots of time and is very expensive. Many studies propose methodologies for augmenting data based on a few annotations (Wei and Zou, 2019;Jiao et al., 2020;Xie et al., 2020;Wu et al., 2019;Kobayashi, 2018) to reduce the cost of annotation. However, most of them encounter a problem-hardly ensuring readability and semantic coherence. To overcome these problems, we introduce a novel method, selective perturbed masking (SPM), to measure the importance of each word in a textual sentence, and further replace the unimportant word with pre-trained language models. In this way, the semantics of a given instance will be preserved, and the training set will be enlarged with various auto-generated data.
While testing the data augmentation methods, classification tasks are often adopted. Sentiment analysis is one of the well-known classification tasks. To mining more fine-grained information, aspect-based sentiment analysis (ABSA) is proposed. ABSA not only aims at detecting the sentiment polarity but also attempts to mine the analysis aspect. It can be extended to subtasks of aspect category sentiment classification (ACSC), aspect term sentiment classification (ATSC), and aspect term extraction (ATE). For example, the answers to these tasks to the given sentence "the staff was so kind" are (service, positive), (staff, positive), and "staff", respectively. These tasks are commonly taken as examples for evaluating the performance of augmentation methods. In this paper, we explore both sentiment analysis and all subtasks in ABSA to show the usefulness of the proposed method in semantics preservation.
In addition to probe on sentiment analysis and ABSA tasks, we also experiment on stock price and risk movement prediction tasks. These experiments have always been considered as real-world sentiment analysis application scenarios (Xu and Cohen, 2018). The experimental results on all kinds of datasets support the usefulness of the proposed data augmentation method, and also indicate that the improvement of using the proposed method is larger than that of using other augmentation methods. Our main contributions are summarized as follows: 1. We propose a semantics-preserved augmentation method, 1 which provides more training data without changing the meaning of the given instances in augmentation.
2. Our experimental results show the proposed method can achieve better performances in four aspect-based sentiment analysis datasets and two sentiment analysis datasets.
3. An additional exploration of a real-world scenario on stock price/risk movement predic-tion supports the robustness of the proposed method.
2 Related Work 2.1 Data Augmentation Jiao et al. (2020) utilize the word embeddings to obtain the similar terms as the substitutions. Wei and Zou (2019) add noises by randomly adding, deleting, swapping the words, and substituting words with synonyms. Kobayashi (2018); Wu et al. (2019) consider label information and use the language model to randomly replacing single-word with more diverse substitutions. Xie et al. (2020) replace the uninformative words based on TF-IDF scores. Although previous works show that their methods can improve the performance of some NLP tasks, their data augmentation methods may change the meanings of the given instances. In this paper, we propose a method that not only provides more data, but also retains the meanings of the original instances. Our experimental results show that the proposed method performs well in all datasets.

Aspect-Based Sentiment Analysis
Wang et al. (2016) propose an attention-based LSTM, which can concentrate on distinct parts of a sentence by calculating the corresponding attention weights, to learn aspect embedding. Ma et al. (2017) find that interactive attention networks can learn the representations of target and context separately, which is helpful to sentiment classification. BERT-based (Devlin et al., 2019) methods have shown to be effective on ABSA (Hoang et al., 2019;. Because BERT-based methods achieve the best performance in most ABSA tasks, we adopt BERT as the base architecture for test performance of data augmentation methods.

Methods
This section describes how we combine an auxiliary sentence with SPM in detail to decide which words are unimportant to the aspect or sentiment of a given instance. After deciding the unimportant words, we use the proposed token replacement methods to construct the augmented sentences, in which aspect and sentiment are not altered.

An Auxiliary Sentence Approach
Inspired by Sun et al. (2019) and Schick and Schütze (2021), we utilize an auxiliary sentence containing the aspect and the sentiment in a review. For each review, we concatenate on the auxiliary sentence and the review with a special token [SEP]. For example, the auxiliary sentence "the polarity of the service is positive" and the review "the staff was so kind to us" are concatenated to "the polarity of the service is positive [SEP] the staff was so kind to us". An input sentence (S) is thus formulated as follows: S = [w a 1 , ..., w a aspect , ..., w a sentiment , w [SEP] , w r 1 ..., w r N ] , where w a and w r denote words in auxiliary sentence and review, respectively. w aspect and w sentiment are the words denoting aspect and sentiment, respectively, and N is the length of the review.

Selective Perturbed Masking (SPM)
Wu et al. (2020) introduce perturbed masking (PM) to analyze syntactic information, and verify its effectiveness on syntactic parsing and discourse dependency parsing. In this work, we propose selective perturbed masking (SPM) to estimate the correlation between the tokens in reviews and sentiment words (aspect terms) in the auxiliary sentence. The following procedure is proposed to measure the impact w r i (1 ≤ i ≤ N ) on predicting w a aspect and w a sentiment , respectively. First, we replace w aspect (w sentiment ) with the special token [MASK], and use this word sequence as BERT's input for predicting the masked word. The output embedding is called E a . Note that the SPM method that masks w aspect is named AS-SPM, and the SPM method that masks w sentiment is called Senti-SPM. Then, we replace both w aspect (w sentiment ) and . The BERT's output embedding at the position of w aspect (w sentiment ) is considered as E r i . Thirdly, we calculate the Euclidean distance (ED) between E a and each E r i , where

Token Replacement Strategy
After doing SPM for reviews, our model replaces the unimportant terms under the following two strategies.  Auto Encoding (AE): The AE model is initialized with the pre-trained weights of BERT (Devlin et al., 2019), which will predict the masked word. We use the predicted word to replace the unimportant term.
Sequence-to-Sequence (Seq2Seq): The Seq2Seq model is initialized with the pre-trained weights of BART (Lewis et al., 2020) which includes encoder and decoder. The predicted word from the decoder is adopted for replacing the unimportant term.
In our experiments, we use the BERT-baseuncased model to show the performances with and without the proposed augmentation methods. Additionally, we compare with commonly used data augmentation methods, including Back Translation (BT) (Edunov et al., 2018), Easy Data Augmentation (EDA) (Wei and Zou, 2019), and C-BERT (Wu et al., 2019). In ACSC, ATSC, and SC tasks, we double the original training set in size. In ATE task, we augment the reviews according to the number of aspect terms. Accuracy is adopted as the evaluation metric for ACSC, ATSC, and SC tasks. F1-score is used in ATE task.

Experimental Results
We report the averaged results across five random seeds in Table 2, and the standard deviations are also shown in subscripts. We do not adopt EDA to the ATE task because the insertion and the deletion operations are not suitable for token-level tasks. Firstly, we find that the combinations of the SPM settings (AS-SPM and Senti-SPM) and token replacement strategies (AE and Seq2Seq) achieve better performances on all settings with stable results (lower standard deviations). That indicates our augmentation methods are effective. Secondly, some approaches slightly harm the performance of some datasets. For example, using BT and EDA in ACSC-Rest14 and ATSC-Rest16 gets lower performances than using vanilla BERT; using C-BERT in ATSC-Rest15 and ATSC-Rest16 gets lower performances than using vanilla BERT. Additionally, the proposed SPM consistently outperforms the random masking strategy (C-BERT). Thirdly, the proposed token replacement strategy Seq2Seq performs well in ACSC and ATSC, and AE achieves the best results in ATE.

Multilingual Experiment
In this section, we utilize Google Translate to translate corresponding auxiliary sentences, and experiment on ABSA datasets (Pontiki et al., 2016) Table 3. In most languages, the proposed method improves more performance than other data augmentation methods.

Multi-Aspect Multi-Sentiment Experiment
Jiang et al. (2019) propose a multi-aspect multisentiment dataset, MAMS. In MAMS, each instance is annotated with different sentiments from at least two aspects. They claim that this is a more challenging dataset than that in the previous works. We further experiment on this dataset with the proposed method, and report the results in Table 5. These results support that the proposed method is also helpful in this more challenging dataset in both ATSC and ACSC tasks. Figure 1 shows the results on ACSC-Rest14 under different multiples of the training set size. The performances of models become better than that only using the original training set when the multiple is between 1 and 2. It also shows that using too much training data generated by the proposed method harms the performance of ABSA because it may contain too much noise. Table 6 presents the augmented results of different approaches. It shows that previous approaches may change the aspect or sentiment of an instance. For example, EDA generates an unmeaningful sentence. Although "ugly" could be a hint for negative sentiment, the aspect of the generated instance is changed. C-BERT shows the other worse casethe sentiment of the generated review is changed. Both cases show the importance of the proposed idea, i.e., controlling the aspect or sentiment word when generating a new sentence. In contrast, the proposed approach is controllable. When using Original review: But the staff was so horrible to us. BT But the staff were so awful for us. EDA But so staff was the ugly to uranium. C-BERT But the situation was being good to me.

AS-SPM & AE
But the staff was always horrible to me. Senti-SPM & AE But the situation was so horrible to me. AS-SPM, the aspect will not be changed. On the other hand, when using Senti-SPM, the proposed approach keeps the same sentiment polarity as the original review.

Stock Price/Risk Movement Prediction
In financial markets, return and risk are two aspects that most investors focus on. The task setting is similar to ABSA tasks. In this section, we discuss the experiments of stock price/risk movement prediction in the benchmark dataset, StockNet (Xu and Cohen, 2018). We ask models to predict whether the return (risk) will increase or decrease after n days, where return (µ t ) is Pt−P t−n P t−n and P t is the close price at time t. Risk (σ t ) is defined as follows: where n is 3. Total 19,107 instances are included in the StockNet, and we use 85% of instances as training set and the rest as test set. Accuracy (ACC.) is adopted as the evaluation metric. The auxiliary sentences in this experiment are different from those in ABSA tasks. For example, when predicting risk movement, the auxiliary sentence is " Market risk will [MASK] ". In accordance with the task, there are two kinds of SPM, i.e., Return-SPM and Risk-SPM. Table 7 shows the experimental results. The proposed methods outperform other data augmentation methods in both price/risk movement prediction tasks.

Influence of Auxiliary Sentence
In this section, we discuss an interesting research question: whether different auxiliary sentences will influence the performance. For example, how good the performance is if we only use simple "[MASK]" as the auxiliary sentence, and how tense influences performance. We use the experiments on stock risk movement prediction as an example.   term ("Risk") performs better than using simple "[MASK]" tag only. Additionally, the tense of the auxiliary sentence is also influential. Since "risk" may be related to different issues, we find that adding an issue-specific term ("Market") can provide slight improvement. In sum, our experiments show the importance of selecting auxiliary sentences in the data augmentation process.

Conclusion
In this paper, we present a controllable augmentation for ABSA, which is controllable to generate reasonable reviews without converting aspect-level polarity. We propose SPM to measure the impact of the related words on deciding specific aspect and sentiment, and adopt two replacement strategies to ABSA tasks. Experimental results show the effectiveness and robustness of our approaches. Additionally, the exploration in the financial application scenario also supports the usefulness of the proposed method. In the future, we plan to use the proposed method for data augmentation on longer documents and for generating the training instances of low-resource languages.