Learning Cooperative Interactions for Multi-Overlap Aspect Sentiment Triplet Extraction

,


Unstructured text:
The ease of use and the top (slightly expensive) service from Apple never disappoint.Aspect sentiment triplets: (use, ease, positive) (use, never disappoint, positive) (service, never disappoint, positive) (service, slightly expensive, negative) (service, top, positive) Table 1: Overlapped triplets under multi-aspect and multi-opinion.And multi-overlap triplets are shown in the dotted box.Besides, aspects, opinions, and sentiments are marked with red, blue, and green.
the complex relations between aspects and opinions.Table 1 shows the example, "The ease of use and the top ( slightly expensive ) service from Apple never disappoint".Obviously, the aspect "use" has two opinions "ease" and "never disappoint", whereas the other aspect "service" has three opinions "top", "slightly expensive " and "never disappoint".And the opinion "never disappoint" is shared by "use" and "service".The multi-overlap triplets are shown in Table 1 dotted box, and they have an overlapped aspect and an overlapped opinion.Obviously, multi-overlap triplets are more challenging than other overlapped triplets in capturing the relations between aspects and opinions.Most existing methods suffer from multi-overlap triplets.Thus, they cannot fully solve ASTE task.
There are two major research lines on ASTE task: tag-aware methods and span-aware methods.Tagaware methods utilize tagging schemes to identify three factors of a triplet.However, most of them (Peng et al., 2020;Xu et al., 2020) cannot address the words with multi-tag by assigning a fixed tag to a word.Therefore, they fail to extract multi-overlap triplets.Besides, these methods (Wu et al., 2020;Chen et al., 2021b) suffer from triplets with multiword spans because they focus on the interactions between words.Span-aware methods are free from the trouble of multi-word spans because they consider the whole spans to identify the start and end boundaries.Span-aware methods mainly include question-driven (Mao et al., 2021;Gao et al., 2021) and joint generation (Yan et al., 2021;Mukherjee et al., 2021).However, these methods also fail to solve multi-overlap triplets because they focus on the single interactions between an aspect and an opinion when extracting triplets.
In short, most existing methods are plagued with multi-overlap triplets.In this paper, we propose an effective multi-overlap triplet extraction method, which decodes the complex relations between multiple aspects and opinions by learning their cooperative interactions.Overall, we adopt an encoderdecoder architecture.And a joint decoding mechanism (JDM) is designed in the decoding process: employing a multi-channel strategy to learn cooperative interactions between multiple aspects and opinions and promote their generation in different channels.Furthermore, we construct a correlationenhanced network (CEN) by encoding the context with dependency relations, reinforcing the interactions between related aspects and opinions when predicting their sentiments.Besides, we design a relation-wise calibration scheme to filter out unfaithful triplets and alleviate error propagation.Our method can effectively solve overlapped triplets, especially multi-overlap triplets.
Our contributions are summarized as follows: • We propose a multi-overlap triplet extraction method, which decodes the complex relations between aspects and opinions by learning their cooperative interactions.Our method can effectively solve multi-overlap triplets.
• We design a joint decoding mechanism, which employs a multi-channel strategy to capture the cooperative interactions between multiple aspects and opinions and promote their generations in different channels.
• We construct a correlation-enhanced network to enhance the interactions between related aspects and opinions when predicting their sentiments.
• Extensive experiments show that our method outperforms baselines.Besides, it achieves significant improvement for multi-overlap triplets.

Related work
The ASTE task includes three fundamental tasks: aspect term extraction (Xu et al., 2018), opinion term extraction (Yu et al., 2018), and aspectoriented sentiment classification (Pontiki et al., 2016).These fundamental tasks are point keys in solving ASTE task.As a compound task, ASTE has two main research lines, including tag-aware methods (Xu et al., 2020;Zhang et al., 2020a;Chen et al., 2021b) and span-aware methods (Chen et al., 2021a;Mao et al., 2021;Xu et al., 2021).Then, we introduce related works on each research line.
Tag-aware methods assign a single tag to each word by tagging schemes.Peng et al. (2020) utilize two BIEOS-based sequence tagging schemes to extract aspect-sentiment pairs and opinions and then identify their relations.Xu et al. (2020) leverage a unified sequence tagging scheme to jointly extract three factors of a triplet.Chen et al. (2021b) propose a grid tagging scheme to tag relations between word-word pairs to fill a sentiment relation table.However, these methods limit the interactions between aspects and opinions by assigning a fixed tag to each word, and they ignore the impact of the relations between multiple aspects and opinions.Therefore, they fail to solve multi-overlap triplets.
Span-aware methods identify the aspect and opinion spans by considering the start and end boundaries.Mao et al. (2021) propose a questiondriven method based on a reading comprehension scheme.They select one or more answers to a question to extract three factors of a triplet.Then, Chen et al. (2021a) propose a bidirectional questiondriven method to solve ASTE task.However, these methods focus on the interactions between a question and its answers.Therefore, they still cannot fully solve multi-overlap triplets.Mukherjee et al. (2021) propose a generation method based on a recurrent neural network (RNN) to decode entire triplets.However, they also cannot fully solve multi-overlap triplets due to the limitation of the single interactions between an aspect and an opinion.Yan et al. (2021) propose a unified generation framework to extract triplets through a sequence output.However, they still suffer from multi-overlap triplets.
Unlike the above methods, our method considers the complex relations between aspects and opinions and captures their cooperative interactions to solve multi-overlap triplets.

Task Formulation
For ASTE task, given a sentence X = {x 1 , x 2 , ..., x n }.The sentence X is a sequence of words, x i is the i th word, and n is the length of the sentence.We use a, o, and s to represent aspect spans, opinion spans, and sentiment polarities, respectively.Besides, the superscript s and e denote the start position and the end position of a span.Therefore, we formulate that (a s , a e ) denotes the span of an aspect and (o s , o e ) denotes the span of an opinion.Besides, s p denotes sentiment polarities, where p ∈ (Positive, Neutral, Negative).Each aspect sentiment triplet is defined as a 5-point tuple T i = (a s i , a e i , o s i , o e i , s p i ).ASTE task aims to extract all aspect sentiment triplets in a text.

Model Architecture
As shown in Figure 1, our method consists of a representation encoder (RE), joint decoding mechanism (JDM), and correlation-enhanced network (CEN) components.Specially, three components are depicted as follows: RE.The RE component takes the Bart-encoder as a backbone, which constructs an input sentence and then encodes the sentence to obtain contextualized hidden representations.
JDM.The JDM component consists of a sharing decoder unit with a pointer-network and three channels (i.e., an aspect channel, an opinion channel, and an auxiliary channel).The sharing decoder unit takes the Bart-decoder as a backbone and then replicates the same structure to these three channels so that they can share parameters during training.And the JDM component can jointly train and optimize these three channels through the sharing decoder unit and generate some candidate aspect and opinion spans.
CEN.The CEN component consists of an interaction enhancement module and a relation-wise calibration scheme.The interaction enhancement module encodes the context with dependency relations to reinforce the interactions between related aspects and opinions while predicting their sentiments.And the error propagation is alleviated by the relation-wise calibration scheme.

The JDM component
We divide the whole sentence into aspect spans, opinion spans, and other spans.Besides, we use (aux s , aux e ) to represent other spans.Then, we construct target sequences for three channels.The target sequence of each channel consists of pointer indexes, which refer to the position indexes of a sentence.Figure 2 presents a sentence with pointer indexes and the examples of three channels.
Each channel takes the hidden states H E of the RE component and previous outputs Ŷ<t of the channel as inputs to get the next hidden state h D t .
where h D t ∈ R d , the probability distribution P t is as follows.
M D = Decoder.embed_tokens(X)(4) where During inference, we put the start token [s] and the channel symbol token (i.e., [AC], [OC], or [AuxC]) instead of a single [s] into different channels to decode the first token of the target sequences.Besides, we use beam search to get output sequences Ŷ in an auto-regressive manner.

The CEN component
The CEN component aims to identify the relations between aspects and opinions for multi-overlap triplets.It includes an interaction enhancement module and a relation-wise calibration scheme.First, the interaction enhancement module enhances the interactions between related aspects and opinions for sentiment prediction by encoding context dependency.And then, a relation-wise calibration scheme is adopted to make error rectification.The detailed descriptions are as follows.
The CEN component takes the hidden states H E and the output sequences Ŷ a and Ŷ o as inputs.We convert Ŷ a and Ŷ o into aspect and opinion spans.

Interaction Enhancement
We utilize a graph convolutional network (GCN) to enhance the interactions between related aspects and opinions.First, we obtain dependency relations from the parser tree (Mrini et al., 2019).Then, we leverage the GCN to encode the context with dependency relations.The equation (Li et al., 2021) is as follows.
where A is a dependency probability matrix, and h l i is the i th node at the l th layer.And the initial representation of h l i comes from H E .W l and b l are learnable parameters.σ is an activation function.
We get enhanced states . Then, we concatenate the start word representations of the aspect and opinion spans to predict their sentiments.The equations are as follows.GCN is valuable to the dependency graph, but it heavily relies on the qualities of the parser tree.In this component, a relation-wise calibration scheme is adopted to alleviate dependency parser errors.Therefore, GCN can effectively reinforce the interactions between related aspects and opinions for sentiment prediction.It is crucial to solving multioverlap triplets.

Relation-wise Calibration
We can obtain the representations of aspect spans and opinion spans based on H E .Every word can not play an equal role in a multi-word span representation.Therefore, we use self-attention to convert a multi-word span representation into a vector, emphasizing the meaning of important words.We formulate that sr denotes the span representation.The sr is described as follows: where M sr ∈ R L * d denotes all word representations of the span from start position i to end position j, and L denotes the number of words in the span.A sr ∈ R L is a weight matrix, and sr ∈ R d .W1 sr and W 2 sr are learnable parameters.We formulate that sr a and sr o denote the representations of the aspect span and the opinion span.Then, we perform Cartesian Product on aspect spans and opinion spans to obtain candidate aspect-opinion pair representations.The candidate set SR ao = {sr 1 ao , sr 2 ao , ..., sr q ao }, where sr q ao denotes the q th aspect-opinion pair representation: where sr i a denotes the i th representation of the aspect span, sr j o denotes the j th representation of the opinion span, and sr i asr j o denotes the difference value between them.
Then, we use two linear layers to identify related aspect-opinion pairs from the candidate set as a calibration scheme to identify unfaithful triplets.The equation is as follows.
r and b 2 r are learnable parameters, f (.) denotes a non-linear activation function, and r ∈ {V alid, Invalid}.
We utilize log-likelihood to optimize L CEN : where r * is a validation between positive and negative instances, and m * is the sentiment relation between an aspect and an opinion.

Training
The training objective is the loss sum of the JDM component and the CEN component.
where α and β are hyperparameters.

Datasets
We evaluate our method on D 20a dataset (Peng et al., 2020) and D 20 b dataset (Xu et al., 2020).

Baselines
The baselines can be summarized as two groups: tag-aware methods and span-aware methods.
Tag-aware methods.RINANTE model (Dai and Song, 2019) and Li-unified model (Li et al., 2019) use sequence tagging schemes to solve aspect-opinion pair extraction task and aspectsentiment pair extraction task.Peng et al. (2020) modify them to RINANTE+ and Li-unified+ for ASTE task.Peng-stage model (Peng et al., 2020) utilizes two sequence tagging schemes to jointly solve ASTE task.JET model (Xu et al., 2020) designs a joint tagging method to identify triplets.GTS model (Wu et al., 2020) utilizes a table filing method to fill a sentiment relation table.
Span-aware methods.Dual-MRC model (Mao et al., 2021) transforms ASTE task into a reading comprehension scheme to extract aspects, opinions, and their corresponding sentiment polarities.BART model (Yan et al., 2021) utilizes a unified framework to decode triplets through a sequence output.PASTE model (Mukherjee et al., 2021) leverages RNN to construct a generative structure, which generates an entire triplet at each time step.

Implementation Details
The uncased English version of BART base is our backbone.We conduct experiments on a single GCU (Nvidia GeForce RTX 2080 Ti) with CUDA version 11.4.The model is trained for 30 epochs with batch size of 8, linear warmup of 1e-1, and weight decay of 1e-2.And we use AdamW optimizer with a learning rate of 5e-5.The dropout rate is 0.5 in Equation 5.And ReLU is the primary activation function in equations.The number of GCN layers is set to 2. We fix the hyperparameters α and β as 0.1 and 0.3 for the joint training loss in Equation 17.We report the average results of five runs with different random seeds.

Main Results
As aforementioned, there are two research lines on ASTE task: tag-aware methods and span-aware methods.For each research line, we compare our method with the above baselines and report results in Table 4 and Table 5.First, to conduct a detailed evaluation of ASTE (aspect, opinion, sentiment) task, we take AOPE (aspect, opinion) and ASPE (aspect, sentiment) as the special cases of ASTE to verify the effectiveness of our method on D 20a .Under the F1 metric, we report the results in Table 4 and highlight the best results in bold.Our method dramatically improves these three tasks, that is, 2.06 F1 points for ASPE, 2.07 F1 points for AOPE, and 3.52 F1 points for ASTE on average.The results verify the effectiveness of our method on ASTE task and special cases of it.
Second, we use precision, recall, and F1 score to further evaluate our method for ASTE task on D 20 b .D 20 b presents a more challenging scenario for overlapped triplets.The results are presented in Table 5.Overall, we still obtain remarkable improvement on four datasets.Our method outperforms the best tag-aware method (i.e., GTS) by an average of 5.13 F1 points.And compared with the best span-aware method (i.e., PASTE), we achieve an average of 4.39 F1 improvement.Besides, we observe that most span-aware methods are superior to tag-aware methods.Tag-aware methods suffer from triplets with multi-word spans because they focus on the interactions between words.Spanaware methods are free from the trouble, but most of them don't consider the impact of the complex relations between multiple aspects and opinions while decoding three factors of a triplet.Therefore, their performances are worse than our method on ASTE task.In conclusion, all results verify the effectiveness of our method.

Overlapped Triplets Analysis
We compare our method with PASTE to verify the effectiveness on overlapped triplets.Besides, we evaluate the performance for aspect-overlapped triplets and opinion-overlapped triplets to further identify the area of improvement.Experiment results are shown in Table 6.Our method outperforms PASTE for overlapped triplets and achieves 6.11, 7.5, 7.89, and 5.78 F1 improvements on four datasets.Besides, we obtain an average of 8.82 F1 improvements for aspect-overlapped triplets and an average of 4.39 F1 improvements for opinionoverlapped triplets.The reason for the gap between them is imbalanced data distribution.Besides, we observe that PASTE shows a better recall for opinion-overlapped triplets on 16res, but it is worse than our method due to low precision.This fact suggests that we perform well on overlapped triplets, including both aspect-overlapped triplets and opinion-overlapped triplets.In short, all results show that our method achieves significant improvement for overlapped triplets.

Multi-Overlap Triplets Analysis
To verify the effectiveness of multi-overlap triplets, we evaluate the performances on D 20 b .The 'Res' is a combined dataset from 14res, 15res, and 16res.And the 'Lap' comes from 14lap.The results are reported in effectively solve multi-overlap triplets.We obtain convincing improvement, 6.75 F1 points on the Lap dataset and 6.88 F1 on the Res dataset.Compared to precision, our recall achieves more remarkable improvement than the PASTE.Notably, we surpass PASTE by 8.91 recall points at most on Res dataset.PASTE focuses on the single interactions between an aspect and an opinion when extracting triplets, whereas our method considers the cooperative interactions between multiple aspects and opinions.Therefore, we can better capture more complex relations between aspects and opinions.
In conclusion, we gain significant improvement for multi-overlap triplets.

Joint Decoding Mechanism Efficiency
In the JDM component, we employ the joint decoding of aspects and opinions to promote their generation.

Cases Study
We compare our method with PASTE for two cases in Table 9.The first case contains four multioverlap triplets, and we gain the best performance for them, whereas PASTE misses a triplet.The result indicates that our method can model the complex relations between multiple aspects and opinions.For the second case, our method and PASTE make a mistake on a triplet (goat cheese, expensive, NEG) while predicting its sentiment.The main reason is the imbalanced distribution between positive and negative triplets.Overall, we still outperform PASTE because they miss a multi-overlap triplet again.In short, all cases demonstrate that we can perform well on multi-overlap triplets.

Ablation Study
We conduct an ablation study to examine the rationality of our method design, and the results are reported in Table 10.The average F1 denotes the results of our method on four datasets over 5 runs.We remove the auxiliary channel, sharing decoder unit, CEN component, and relation-wise calibration.The negative results indicate the absence of any part can decrease our performance.Especially, our performance dropped by 5.20 F1 points on average when the CEN component is replaced with two linear layers.In short, the design of our method is reasonable and achieves the best performance.

Conclusion
We propose a multi-overlap triplet extraction method to explore the complex relations between multiple aspects and opinions by learning their cooperative interactions.It addresses the limitation that most methods focus on the single interactions between an aspect and an opinion while decoding three factors of a triplet.The ATE and OTE tasks are solved in the decoding process through a joint decoding mechanism.And a correlationenhanced network reinforces the interactions between related aspects and opinions while predicting their sentiments.Our method obtains convincing improvements on overlapped triplets, especially multi-overlap triplets.

Limitations
Though we obtain convincing performances on multi-overlap triplet extraction, the high time cost is an obvious limitation.The multi-overlap triplet extraction is hugely time-consuming by decoding complex relations under multiple aspects and opinions.Inevitably, our method may be slightly slower than several previous methods.In the follow-up work, we will pay more attention to time consumption.

Figure 1 :
Figure 1: The overall architecture consists of RE, JDM, and CEN components.The top-left green dotted box refers to the CEN component, the bottom-left red dotted box refers to the RE component, and the right blue dotted box refers to the JDM component.

Figure 2 :
Figure 2: The target sequence examples for the aspect channel, the opinion channel, and the auxiliary channel.During training, we convert channel symbol tokens (i.e., [AC], [OC], and [AuxC]) and the special start token and end token (i.e., [s] and [/s]) to corresponding class indexes.
10) where sr G ao denotes the concatenation of the start word representations of an aspect span and an opinion span.Furthermore, W 1 m , b 1 m ,W 2 m and b 2 m are learnable parameters.And m ∈ {P ositive, N egative, N eutral, N one}.
and d is the hidden dimension.<CLS>I enjoyed a caesar salad while my wife had ( expensive ) goat cheese -both very tasty .<SEP> To identify different channels, we add channel symbol tokens [AC], [OC], and [AuxC] to different target sequences.And, a special start token [s] and a special end token [/s] are added to target sequences.The target sequences are as follows: Y = [s], [AuxC], aux s 1 , aux e 1 , ..., aux s k , aux e k , [/s] d , and P t ∈ R n is the probability distribution on word indexes from a sentence.During training, we define the target sequence of the aspect channel as Y a = {y a 1 , y a 2 , ..., y a Ta }, the target sequence of the opinion channel asY o = {y o 1 , y o 2 , ..., y oTo }, and the target sequence of the auxiliary channel as Y aux = {y aux 1 , y aux 2 , ..., y aux Taux }.Then, we utilize a crossentropy for optimization with L JDM :L JDM = − (

Table 2 :
The sentence-level statistics of four datasets on D 20a and D 20 b .#S denotes the overall number of sentences.#MulPol denotes the number of sentences with triplets of different sentiments.#OverLap denotes the number of sentences with overlapped triplets.And the items (i.e., 1, 2, 3, 4, and ≥ 5 ) denote the number of sentences with 1, 2, 3, 4, and more triples.

Table 3 :
The triplet-level statistics of four datasets on D 20a and D 20 b , where #T denotes the number of triplets.
These two datasets include three sub-datasets (i.e., 14res, 15res, 16res) in a restaurant domain and a sub-dataset (i.e., 14lap) in a laptop domain.D 20 b is the revised version of D 20a , including more overlapped triplets.Specially, we calculate the number of sentences with overlapped triplets, that is D 20a is 21.00%, 17.62%, 19.26%, 19.02% and D 20 b is 32.54%, 29.66%, 25.86%, 25.48% on 14res, 14lap, 15res, 16res, respectively.We give the sentencelevel and triplet-level statistics on D 20a and D 20 b .The detailed statistics are presented in Table 2 and 3.The datasets and codes are available 1 .

Table 4 :
Li et al. (2019)ore for ASPE, AOPE, and ASTE tasks on D 20a .The baseline results are retrieved fromLi et al. (2019).We highlight the best results in bold.

Table 7 .
Obviously, our method can

Table 8 :
Comparison P (Precision), R (Recall), F1 score for ATE and OTE tasks on D 20 b .

Table 9 :
Case study for ASTE task on laptop and restaurant domains.The red and blue indicate the aspects and opinions.Besides, 'POS', 'NEU', and 'NEG' indicate positive, neutral, and negative sentiments.

Table 10 :
Comparison of average F1 score for ablation study on D 20 b .