PASTE: A Tagging-Free Decoding Framework Using Pointer Networks for Aspect Sentiment Triplet Extraction

Aspect Sentiment Triplet Extraction (ASTE) deals with extracting opinion triplets, consisting of an opinion target or aspect, its associated sentiment, and the corresponding opinion term/span explaining the rationale behind the sentiment. Existing research efforts are majorly tagging-based. Among the methods taking a sequence tagging approach, some fail to capture the strong interdependence between the three opinion factors, whereas others fall short of identifying triplets with overlapping aspect/opinion spans. A recent grid tagging approach on the other hand fails to capture the span-level semantics while predicting the sentiment between an aspect-opinion pair. Different from these, we present a tagging-free solution for the task, while addressing the limitations of the existing works. We adapt an encoder-decoder architecture with a Pointer Network-based decoding framework that generates an entire opinion triplet at each time step thereby making our solution end-to-end. Interactions between the aspects and opinions are effectively captured by the decoder by considering their entire detected spans while predicting their connecting sentiment. Extensive experiments on several benchmark datasets establish the better efficacy of our proposed approach, especially in recall, and in predicting multiple and aspect/opinion-overlapped triplets from the same review sentence. We report our results both with and without BERT and also demonstrate the utility of domain-specific BERT post-training for the task.


Introduction
Aspect-based Sentiment Analysis (ABSA) is a broad umbrella of several fine-grained sentiment analysis tasks, and has been extensively studied since its humble beginning in SemEval 2014 (Pontiki et al., 2014a). Overall, the task revolves around * Equal contribution Sent 1: The film was good , but could have been better .
Triplets [Aspect ; Opinion ; Sentiment] (1) film ; good ; positive (2) film ; could have been better ; negative Sent 2: The weather was gloomy , but the food was tasty .

Triplets
(1) weather ; gloomy ; negative (2) food ; tasty ; positive automatically extracting the opinion targets or aspects being discussed in review sentences, along with the sentiments expressed towards them. Early efforts on Aspect-level Sentiment Classification (Tay et al., 2018;Li et al., 2018a;Xue and Li, 2018) focus on predicting the sentiment polarities for given aspects. However, in a real-world scenario, aspects may not be known a-priori. Works on End-to-End ABSA (Li et al., 2019;He et al., 2019;Chen and Qian, 2020) thus focus on extracting the aspects as well as the corresponding sentiments simultaneously. These methods do not however capture the reasons behind the expressed sentiments, which could otherwise provide valuable clues for more effective extraction of aspect-sentiment pairs.
Consider the two examples shown in Table 1. For the first sentence, the sentiment associated with the aspect film, changes depending on the connecting opinion phrases; good suggesting a positive sentiment, and could have been better indicating a negative sentiment. Hence, simply extracting the pairs film-positive, and film-negative without additionally capturing the reasoning phrases may confuse the reader. For the second sentence, the opinion term gloomy has a higher probability of being associated with weather, than with food. We therefore observe that the three elements or opinion factors of an opinion triplet are strongly interdependent. In order to offer a complete picture of what is being discussed, how is the sentiment, and why is it so, (Peng et al., 2020) defined the task of Aspect Sentiment Triplet Extraction (ASTE). Given an opinionated sentence, it deals with extracting all three elements: the aspect term/span, the opinion term/span, and the connecting sentiment in the form of opinion triplets as shown in Table 1. It is to be noted here that a given sentence might contain multiple triplets, which may further share aspect or opinion spans (For e.g., the two triplets for Sent. 1 in Table 1 share the aspect film). An efficient solution for the task must therefore be able to handle such challenging data points. Peng et al. (2020) propose a two-stage pipeline framework. In the first stage, they extract aspectsentiment pairs and opinion spans using two separate sequence-tagging tasks, the former leveraging a unified tagging scheme proposed by (Li et al., 2019), and the later based on BIEOS 1 tagging scheme. In the second stage, they pair up the extracted aspect and opinion spans, and use an MLP-based classifier to determine the validity of each generated triplet. Zhang et al. (2020) propose a multi-task framework to jointly detect aspects, opinions, and sentiment dependencies. Although they decouple the sentiment prediction task from aspect extraction, they use two separate sequence taggers (BIEOS-based) to detect the aspect and opinion spans in isolation before predicting the connecting sentiment. Both these methods however break the interaction between aspects and opinions during the extraction process. While the former additionally suffers from error propagation problem, the latter, relying on word-level sentiment dependencies, cannot guarantee sentiment consistency over multi-word aspect/opinion spans. Xu et al. (2020b) propose a novel position-aware tagging scheme (extending BIEOS tags) to better capture the interactions among the three opinion factors. One of their model variants however cannot detect aspect-overlapped triplets, while the other cannot identify opinion-overlapped triplets. Hence, they need an ensemble of two variants to be trained for handling all cases. Wu et al. (2020) try to address this limitation by proposing a novel grid tagging scheme-based approach. However, they end up predicting the relationship between every possible word pair, irrespective of how they are syntactically connected, thereby impacting the span-level sentiment consistency guarantees.
Different from all these tagging-based methods, 1 BIOES is a commonly used tagging scheme for sequence labeling tasks, and denotes "begin, inside, outside, end and single" respectively. we propose to investigate the utility of a taggingfree scheme for the task. Our innovation lies in formulating ASTE as a structured prediction problem. Taking motivation from similar sequenceto-sequence approaches proposed for joint entityrelation extraction (Nayak and Ng, 2020;Chen et al., 2021), semantic role labeling (Fei et al., 2021) etc., we propose PASTE, a Pointer Networkbased encoder-decoder architecture for the task of ASTE. The pointer network effectively captures the aspect-opinion interdependence while detecting their respective spans. The decoder then learns to model the span-level interactions while predicting the connecting sentiment. An entire opinion triplet is thus decoded at every time step, thereby making our solution end-to-end. We note here however, that the aspect and opinion spans can be of varying lengths, which makes the triplet decoding process challenging. For ensuring uniformity, we also propose a position-based representation scheme to be suitably exploited by our proposed architecture. Here, each opinion triplet is represented as a 5-point tuple, consisting of the start and end positions of the aspect and opinion spans, and the sentiment (POS/NEG/NEU) expressed towards the aspect. To summarize our contributions: • We present an end-to-end tagging-free solution for the task of ASTE that addresses the limitations of previous tagging-based methods. Our proposed architecture, PASTE, not only exploits the aspect-opinion interdependence during the span detection process, but also models the span-level interactions for sentiment prediction, thereby truly capturing the inter-relatedness between all three elements of an opinion triplet.
• We propose a position-based scheme to uniformly represent an opinion triplet, irrespective of varying lengths of aspect and opinion spans.
• Extensive experiments on the ASTE-Data-V2 dataset (Xu et al., 2020b) establish the overall superiority of PASTE over strong state-of-theart baselines, especially in predicting multiple and/or overlapping triplets. We also achieve significant (15.6%) recall gains in the process.

Our Approach
Given the task of ASTE, our objective is to jointly extract the three elements of an opinion triplet, i.e., the aspect span, its associated sentiment, and the  Table 2 Sentence Ambience was good , but the main course and service were disappointing . Target Triplets (0 0 2 2 POS) (6 7 11 11 NEG) (9 9 11 11 NEG) Overlapping Triplets (6 7 11 11 NEG) (9 9 11 11 NEG) corresponding opinion span, while modeling their interdependence. Towards this goal, we first introduce our triplet representation scheme, followed by our problem formulation. We then present our Pointer Network-based decoding framework, PASTE, and finally discuss a few model variants.
Through exhaustive experiments, we investigate the utility of our approach and present a performance comparison with strong state-of-the-art baselines.

Triplet Representation
In order to address the limitations of BIEOS tagging-based approaches and to facilitate joint extraction of all three elements of an opinion triplet, we represent each triplet as a 5-point tuple, consisting of the start and end positions of the aspect span, the start and end positions of the opinion span, and the sentiment (POS/NEG/NEU) expressed towards the aspect. This allows us to model the relative context between an aspect-opinion pair which is not possible if they were extracted in isolation. It further helps to jointly extract the sentiment associated with such a pair. An example sentence with triplets represented under the proposed scheme is shown in Table 2. As may be noted, such a scheme can easily represent triplets with overlapping aspect or opinion spans, possibly with varying lengths.

Problem Formulation
To formally define the ASTE task, given a review sentence s = {w 1 , w 2 , ..., w n } with n words, our goal is to extract a set of opinion triplets , where t i represents the i th triplet and |T | represents the length of the triplet set. For the i th triplet, s ap i and e ap i respectively denote the start and end positions of its constituent aspect span, s op i and e op i respectively denote the start and end positions of its constituent opinion span, and senti i repre-sents the sentiment polarity associated between them. Here, senti i ∈ {P OS, N EU, N EU }, where P OS, N EG, and N EU respectively represent the positive, negative, and neutral sentiments.

The PASTE Framework
We now present PASTE, our Pointer networkbased decoding framework for the task of Aspect Sentiment Triplet Extraction. Figure 2 gives an overview of our proposed architecture.

Sentence Encoder
As previously motivated, the association between an aspect, an opinion, and their connecting sentiment is highly contextual. This factor is more noteworthy in sentences containing multiple triplets with/without varying sentiment polarities and/or overlapping aspect/opinion spans. Long Short Memory Networks (or LSTMs) (Hochreiter and Schmidhuber, 1997) are known for their context modeling capabilities. Similar to (Nayak and Ng, 2020;Chen et al., 2021), we employ a Bidirectional LSTM (Bi-LSTM) to encode our input sentences. We use pre-trained word vectors of dimension d w to obtain the word-level features. We then note from Figure 1 that aspect spans are often characterized by noun phrases, whereas opinion spans are often composed of adjective phrases. Referring to the dependency tree in the same figure, the aspect and the opinion spans belonging to the same opinion triplet are often connected by the same head word. These observations motivate us to use both part-of-speech (POS) and dependencybased (DEP) features for each word.
More specifically, we use two embedding layers, E pos ∈ R |POS| × dpos , and E dep ∈ R |DEP| × d dep to obtain the POS and DEP-features of dimensions d pos and d dep respectively, with |POS| and |DEP| representing the length of POS-tag and DEP-tag sets over all input sentences. All three features are concatenated to obtain the input vector representation x i ∈ R dw+dpos+d dep corresponding to the i th word in the given sentence S = {w 1 , w 2 , ..., w n }. The vectors are passed through the Bi-LSTM to obain the contextualized representations h E i ∈ R d h . Here, d h represents the hidden state dimension of the triplet generating LSTM decoder as detailed in the next section. Accordingly, the hidden state dimension of both the forward and backward LSTM of the Bi-LSTM encoder are set to d h /2.
For the BERT-based variant of our model, Bi-LSTM gets replaced by BERT (Devlin et al., 2019) as the sentence encoder. The pre-trained word vectors are accordingly replaced by BERT token embeddings. We now append the POS and DEP features vectors to the 768-dim. token-level outputs from the final layer of BERT.

Pointer Network-based Decoder
Referring to Figure 2, opinion triplets are decoded using an LSTM-based Triplet Decoder, that takes into account the history of previously generated pairs/tuples of aspect and opinion spans, in order to avoid repetition. At each time step t, it generates a hidden representation h D t ∈ R d h that is used by the two Bi-LSTM + FFN-based Pointer Networks to respectively predict the aspect and opinion spans, while exploiting their interdependence. The tuple representation tup t thus obtained is concatenated with h D t and passed through an FFN-based Sentiment Classifier to predict the connecting sentiment, thereby decoding an entire opinion triplet at the t th time step. We now elaborate each component of our proposed decoder framework in greater depth:

Span Detection with Pointer Networks
Our pointer network consists of a Bi-LSTM, with hidden dimension d p , followed by two feedforward layers (FFN) on top to respectively predict the start and end locations of an entity span. We use two such pointer networks to produce a tuple of hidden vectors corresponding to the aspect and opinion spans of the triplet to be decoded at time step t. We concatenate h D t with each of the encoder hidden state vectors h E i and pass them as input to the first Bi-LSTM. The output hidden state vector corresponding to the i th token of the sentence thus obtained is simultaneously fed to the two FFNs with sigmoid to generate a pair of scores in the range of 0 to 1. After repeating the process for all tokens, the normalized probabilities of the i th token to be the start and end positions of an aspect span (s p 1 i and e p 1 i respectively) are obtained using softmax operations over the two sets of scores thus generated. Here p 1 refers to the first pointer network. Similar scores corresponding to the opinion span are obtained using the second pointer network, p 2 ; difference being that apart from h D t , we also concatenate the output vectors from the first Bi-LSTM with encoder hidden states h E i and pass them as input to the second Bi-LSTM. This helps us to model the interdependence between an aspect-opinion pair. These scores are used to obtain the hidden state representations ap t ∈ R 2dp and op t ∈ R 2dp corresponding to the pair of aspect and opinion spans thus predicted at time step t. We request our readers to kindly refer to the appendix for more elaborate implementation details.
Here we introduce the term generation direc-tion which refers to the order in which we generate the hidden representations for the two entities, i.e. aspect and opinion spans. This allows us to define two variants of our model. The variant discussed so far uses p 1 to detect the aspect span before predicting the opinion span using p 2 , and is henceforth referred to as PASTE-AF (AF stands for aspect first). Similarly, we obtain the second variant PASTE-OF (opinion first) by reversing the generation direction. The other two components of our model remain the same for both the variants.

Triplet Decoder and Attention Modeling
The decoder consists of an LSTM with hidden dimension d h whose goal is to generate the sequence of opinion triplets, T , as defined in Section 2.2. Let tup t = ap t op t ; tup t ∈ R 4dp denote the tuple (aspect, opinion) representation obtained from the pointer networks at time step t. Then, tup prev = j<t tup j ; tup 0 = 0 ∈ R 4dp represents the cumulative information about all tuples predicted before the current time step. We obtain an attention-weighted context representation of the input sentence at time step t (s E t ∈ R d h ) using Bahdanau et al. (2015) Attention 2 . In order to prevent the decoder from generating the same tuple again, we pass tup prev as input to the LSTM along with s E t to generate h D t ∈ R d h , the hidden representation for predicting the triplet at time step t:

Sentiment Classifier
Finally, we concatenate tup t , with h D t and pass it through a feed-forward network with softmax to generate the normalized probabilities over {P OS, N EG, N EU } ∪ {N ON E}, thereby predicting the sentiment label senti t for the current triplet. Interaction between the entire predicted spans of aspect and opinion is thus captured for sentiment identification. Here P OS, N EG, N EU respectively represent the positive, negative, and neutral sentiments. N ON E is a dummy sentiment that acts as an implicit stopping criteria for the decoder. During training, once a triplet with sentiment N ON E is predicted, we ignore all subsequent predictions, and none of them contribute to the loss. Similarly, during inference, we ignore any triplet predicted with the N ON E sentiment.
2 Please refer to the appendix for implementation details.

Training
For training our model, we minimize the sum of negative log-likelihood loss for classifying the sentiment and the four pointer locations corresponding to the aspect and opinion spans: Here, m represents the m th training instance with M being the batch size, j represents the j th decoding time step with J being the length of the longest target sequence among all instances in the current batch. s p , e p ; p ∈ {ap, op} and sen respectively represent the softmax scores corresponding to the true start and end positions of the aspect and opinion spans and their associated true sentiment label.

Inferring The Triplets
Let s ap i , e ap i , s op i , e op i ; i ∈ [1, n] represent the obtained pointer probabilities for the i th token in the given sentence (of length n) to be the start and end positions of an aspect span and opinion span respectively. First, we choose the start (j) and end (k) positions of the aspect span with the constraint 1 ≤ j ≤ k ≤ n such that s ap j × e ap k is maximized. We then choose the start and end positions of the opinion span similarly such that they do not overlap with the aspect span. Thus, we obtain one set of four pointer probabilities. We repeat the process to obtain the second set, this time by choosing the opinion span before the aspect span. Finally, we choose the set (of aspect and opinion spans) that gives the higher product of the four probabilities.

Datasets and Evaluation Metrics
We conduct our experiments on the ASTE-Data-V2 dataset created by Xu et al. (2020b). It is derived from ASTE-Data-V1 (Peng et al., 2020) and presents a more challenging scenario with 27.68% of all sentences containing triplets with overlapping aspect or opinion spans. The dataset contains triplet-annotated sentences from two domains: laptop and restaurant, corresponding to the original datasets released by the SemEval Challenge (Pontiki et al., 2014a,b,c). It is to be noted here that the opinion term annotations were originally derived from (Fan et al., 2019). 14Lap belongs to  Train  817  517  126  1692  480  166  783  205  25  1015  329  50  3490  1014  241  Dev  169  141  36  404  119  54  185  53  11  252  76  11  841  248  76  Test  364  116  63  773  155  66  317  143  25  407  78  29  1497 376 120  Tables 3 and 4 present the dataset statistics. We consider precision, recall, and micro-F1 as our evaluation metrics for the triplet extraction task. A predicted triplet is considered a true positive only if all three predicted elements exactly match with those of a ground-truth opinion triplet.

Experimental Setup
For our non-BERT experiments, word embeddings are initialized (and kept trainable) using pre-trained 300-dim. Glove vectors (Pennington et al., 2014), and accordingly d w is set to 300. d pos and d dep are set to 50 each. d h is set to 300, and accordingly the hidden state dimensions of both the LSTMs (backward and forward) of the Bi-LSTM-based encoder are set to 150 each. d p is set to 300. For our BERT experiments, uncased version of pre-trained BERT-base (Devlin et al., 2019) is fine-tuned to encode each sentence. All our model variants are trained end-to-end on Tesla P100-PCIE 16GB GPU with Adam optimizer (learning rate: 10 −3 , weight decay: 10 −5 ). A dropout rate of 0.5 is applied on the embeddings to avoid overfitting 3 . We make our codes and datasets publicly available 4 .

• Wang et al. (2017) (CMLA) and Dai and Song
(2019) (RINANTE) propose different methods to co-extract aspects and opinion terms from review sentences. Li et al. (2019) propose a unified tagging scheme-based method for extracting opinion target-sentiment pairs. Peng et al. (2020) modifies these methods to jointly extract targets with sentiment, and opinion spans. It then applies an MLP-based classifier to determine the validity of all possible generated triplets. These modified versions are referred to as CMLA + , RINANTE + , and Li-unified-R, respectively.
• Peng et al. (2020) propose a BiLSTM+GCNbased approach to co-extract aspect-sentiment pairs, and opinion spans. They then use the same inference strategy as above to confirm the correctness of the generated triplets.
• OTE-MTL (Zhang et al., 2020) uses a multitask learning framework to jointly detect aspects, opinions, and sentiment dependencies.
• JET (Xu et al., 2020b) is the first end-to-end approach for the task of ASTE that leverages a novel position-aware tagging scheme. One of their variants, JET t , however cannot handle aspect-overlapped triplets. Similarly, JET o , cannot handle opinion-overlapped triplets.
• GTS (Wu et al., 2020) models ASTE as a novel grid-tagging task. However, given that it predicts the sentiment relation between all possible word pairs, it uses a relaxed (majority-based) matching criteria to determine the final triplets.   (Peng et al., 2020) 0

Experimental Results
While training our model variants, the best weights are selected based on F 1 scores on the development set. We report our median scores over 5 runs of the experiment. Performance comparisons on the Laptop (14Lap) and combined Restaurant datasets are reported in Table 5, whereas the same on individual restaurant datasets are reported in Table 6. Both the tables are divided into two sections; the former comparing the results without BERT, and the latter comparing those with BERT. The scores for CMLA + , RINANTE + , Li-unified-R, and (Peng et al., 2020)  . For fair comparison, we replicate their results without using the domain-specific embeddings (DE). For both w/ and w/o BERT, we report their median scores over 5 runs of the experiment. We also report the F 1 scores on the development set corresponding to the test set results.  From Table 5, both our variants, PASTE-AF and PASTE-OF, perform comparably as we substantially outperform all the non-BERT baselines. On Laptop, we achieve 13.1% F 1 gains over OTE-MTL, whereas on Restaurant, we obtain 2.2% F 1 gains over GTS-BiLSTM. We draw similar conclusions from Table 6, except that we are narrowly outperformed by JET o (M = 6) on 16Rest. Our better performance may be attributed to our better Recall scores with around 15.6% recall gains (averaged across both our variants) over the respective strongest baselines (in terms of F 1 ) on the Laptop and Restaurant datasets. Such an observation establishes the better efficacy of PASTE in modeling the interactions between the three opinion factors as we are able to identify more ground-truth triplets from the data, compared to our baselines.
With BERT, we comfortably outperform JET on all the datasets. Although we narrowly beat GTS-BERT on Laptop, it outperforms us on all the restaurant datasets. This is owing to the fact that GTS-BERT obtains a substantial improvement in scores over GTS since its grid-tag prediction task and both the pre-training tasks of BERT are all discriminative in nature. We on the other hand, do not observe such huge jumps (F 1 gains of 5.1%, 2.7%, 6.3%, and 3.3% on the Laptop, Rest14, Rest15, and Rest16 datasets respectively, noticeably more improvement on datasets with lesser training data; no gains on Restaurant) since BERT is known to be unsuitable for generative tasks. We envisage to improve our model by replacing BERT with BART (Lewis et al., 2020), a strong sequence-to-sequence pretrained model for NLG tasks.
Finally, motivated by Xu et al. (2019Xu et al. ( , 2020a, we also demonstrate the utility of leveraging domainspecific language understanding for the task by reporting our results with BERT-PT (task-agnostic post-training of pre-trained BERT on domain-specific data) in both the tables. While we achieve substantial performance improvement, we do not use these scores to draw our conclusions in order to ensure fair comparison with the baselines.

Robustness Analysis
In order to better understand the relative advantage of our proposed approach when compared to our baselines for the opinion triplet extraction task, and to further investigate the reason behind our better recall scores, in Table 7 we compare the F 1 scores on various splits of the test sets as defined in Table 4. We observe that with our core architecture (w/o BERT), PASTE consistently outperforms the baselines on both Laptop and Restaurant datasets when it comes to handling sentences with multiple triplets, especially those with overlapping aspect/opinion spans. This establishes the fact that PASTE is better than previous tagging-based approaches in terms of modeling aspect-opinion span-level interdependence during the extraction process. This is an important observation considering the industry-readiness (Mukherjee et al., 2021b) of our proposed approach since our model is robust towards challenging data instances. We however perform poorly when it comes to identifying triplets with varying sentiment polarities in the same sentence. This is understandable since we do not utilize any specialized sentiment modeling technique. In future, we propose to utilize word-level Valence, Arousal, Dominance scores (Mukherjee et al., 2021a) as additional features to better capture the sentiment of the opinion phrase.
In this work, we propose a new perspective to solve ASTE by investigating the utility of a taggingfree scheme, as against all prior tagging-based methods. Hence, it becomes imperative to analyze how we perform in terms of identifying individual   elements of an opinion triplet. Table 8 presents such a comparison. It is encouraging to note that we substantially outperform our baselines on both aspect and opinion span detection sub-tasks. However, as highlighted before, we are outperformed when it comes to sentiment detection.

Ablation Study:
Since our Decoder learns to decode the sequence of triplets from left to right without repetition, while training our models we sort the target triplets in the same order as generation direction; i.e. for training PASTE-AF/PASTE-OF, the target triplets are sorted in ascending order of aspect/opinion start positions. As an ablation, we sort the triplets randomly while training the models and report our obtained scores in Table 9. An average drop of 9.3% in F 1 scores for both our model variants establish the importance of sorting the triplets for training our models. When experimenting without the POS and DEP features, we further observe an average drop of 2.3% in F 1 scores, thereby demonstrating their utility for the ASTE task. When experimenting with BERT, although these features helped on the Laptop and Rest15 datasets, overall we did not observe any significant improvement.

Related Works
ABSA is a collection of several fine-grained sentiment analysis tasks, such as Aspect Extraction (Li et al., 2018b, Aspect-level Sentiment Classification (Li et al., 2018a;Xue and Li, 2018), Aspect-oriented Opinion Extraction (Fan et al., 2019), E2E-ABSA (Li et al., 2019;He et al., 2019), and Aspect-Opinion Co-Extraction (Wang et al., 2017;Dai and Song, 2019). However, none of these works offer a complete picture of the aspects being discussed. Towards this end, Peng et al. (2020) recently coined the task of Aspect Sentiment Triplet Extraction (ASTE), and proposed a 2-stage pipeline solution. More recent end-to-end approaches such as OTE-MTL (Zhang et al., 2020), and GTS (Wu et al., 2020) fail to guarantee sentiment consistency over multi-word aspect/opinion spans, since they depend on word-pair dependencies. JET (Xu et al., 2020b) on the other hand requires two different models to be trained to detect aspect-overlapped and opinion-overlapped triplets. Different from all these tagging-based methods, we propose a tagging-free solution for the ASTE task.

Conclusion
We investigate the utility of a tagging-free scheme for the task of Aspect Sentiment Triplet Extraction using a Pointer network-based decoding framework. Addressing the limitations of previous tagging-based methods, our proposed architecture, PASTE, not only exploits the aspect-opinion interdependence during the span detection process, but also models the span-level interactions for sentiment prediction, thereby truly capturing the interrelatedness between all three elements of an opinion triplet. We demonstrate the better efficacy of PASTE, especially in recall, and in predicting multiple and/or overlapping triplets, when experimenting on the ASTE-Data-V2 dataset.