Interpreting Sentiment Composition with Latent Semantic Tree

As the key to sentiment analysis, sentiment composition considers the classification of a constituent via classifications of its contained sub-constituents and rules operated on them. Such compositionality has been widely studied previously in the form of hierarchical trees including untagged and sentiment ones, which are intrinsically suboptimal in our view. To address this, we propose semantic tree, a new tree form capable of interpreting the sentiment composition in a principled way. Semantic tree is a derivation of a context-free grammar (CFG) describing the specific composition rules on difference semantic roles, which is designed carefully following previous linguistic conclusions. However, semantic tree is a latent variable since there is no its annotation in regular datasets. Thus, in our method, it is marginalized out via inside algorithm and learned to optimize the classification performance. Quantitative and qualitative results demonstrate that our method not only achieves better or competitive results compared to baselines in the setting of regular and domain adaptation classification, and also generates plausible tree explanations.


Introduction
Sentiment classification is a task to determine the sentiment polarity of a sentence (Yadav and Vishwakarma, 2020;Dang et al., 2020).Current researches on this task are gradually shifting from improving model performance to interpretability.As the most known stream, feature-based explanation tries to figure out which input feature, say word, has the most influence on the prediction, in the form of the salience score or rationale, and in both self and post-hoc settings (Li et al., 2016;Ribeiro et al., 2016;Kim et al., 2020;Lei et al., 2016;Bastings et al., 2019;De Cao et al., 2020).However, this task requires sentiment composition 1 Data and code implementation is available at https:// github.com/changmenseng/semantic_tree. Figure 1: Different tree structures for explanining sentiment composition, where semantic tree can explain the sentiment composition in the inverted-V structure, as shown in the box of (c).(Polanyi and Zaenen, 2006), which is beyond the ability of these feature-based explanations.
To be concrete, sentiment composition considers the classification of a constituent via 1) classifications of its contained sub-constituents and 2) rules operated on them (Moilanen and Pulman, 2007), as shown in Figure 1(c).Thus, the classification of a sentence is decomposed into hierarchical sentiment compositions of its sub-constituents.Such compositionality has been widely studied previously in the form of hierarchical trees including untagged tree and sentiment tree, as shown in Figure 1.Untagged tree is usually modeled as a latent variable and learned via the task objective (Yogatama et al., 2017;Maillard and Clark, 2018;Choi et al., 2018;Havrylov et al., 2019;Chowdhury and Caragea, 2021).Then, a TreeLSTM (Tai et al., 2015;Zhu et al., 2015) is adopted to encode the sentence following the hierarchy for the final prediction.However, untagged tree is limited because it can only explain the hierarchy but not give labels on all nodes.Sentiment tree takes a further step that every node within has a polarity score or label.As the most representative example, Socher et al. (2013) creates Stanford Sentiment Treebank (SST) that has senti-ment tree annotation.Sentiment tree also appears as a post-hoc explanation giving hierarchical attribution scores (Chen et al., 2020;Zhang et al., 2020).However, in fact, not every constituent is sentimental, some of which are somewhat more functional.For example, while a negator "not" is sentimentally neural, it can functionally flip the sentiment of a constituent.Sentiment labels are therefore not sufficient to explain such phenomenon.
To overcome those defects, we propose semantic tree, a new tree form capable of explicitly and principally interpreting the sentiment composition.In the semantic tree, each node is assigned a label in semantic labels including sentimental and functional ones, and each local inverted-V structure reveals the rule composing adjacent constituents, as shown in Figure 1(c).Inspired by Dong et al. (2015), formally, the semantic tree is a derivation of a context-free grammar (CFG) (Chomsky, 1956) defined by non-terminal symbols (semantic labels), terminal symbols (word vocabulary), rules, and root symbols (positive and negative).The challenge of designing such grammar lies in designing semantic labels and rules, which requires linguistic knowledge of sentiment composition.To address this, we follow previous work about sentiment composition (Polanyi and Zaenen, 2006;Moilanen and Pulman, 2007;Taboada et al., 2011) to carefully design 11 semantic labels and 62 rules.We believe the grammar could cover most cases in sentiment analysis, as shown in Table 1.
We aim to learn a model capable of extracting the semantic tree using data consisting of only sentence-label pairs, which is challenging because the semantic tree is latent without full annotation.
To address this, we first build a semantic tree parser, and then marginalize out the semantic tree to induce a sentiment classifier to conduct supervised training on such data.Fortunately, this marginalization over the exponential tree space is computationally tractable resorting to the inside algorithm (Baker, 1979).This process could be abstracted as a module, namely sentiment composition module (SCM), which computes the compatibility of a prediction in the view of sentiment composition but not only pattern recognition.Accompanying an arbitrary neural text encoder with the proposed SCM, we can build a self-explanatory model that can not only predict the sentiment label but also generate a semantic tree as the explanation.To learn more plausible semantic trees, we further propose two extra objectives to guide the preterminals in the semantic tree, and to make the tree structure more syntactically meaningful.
We conduct experiments on three datasets including MR (Pang and Lee, 2005), SST2 (Socher et al., 2013) and Amazon (Blitzer et al., 2007) in the setting of regular and cross-domain classification.Quantitative and qualitative results demonstrate that our method not only achieves better or competitive results compared to baselines, and also generates plausible tree explanations.

Problem Formalization
The dataset is a collection of tuples {(x n , y n )} N n=1 , each of which contains a sentence x ∈ V * and a sentiment label y ∈ Y, where V is the word vocabulary and Y = {P, N } is the label set consisting of positive (P ) and negative (N ).The task goal is to learn a classifier p(y|x).Since we hope to generate a semantic tree of the input sentence where the sentiment label is its root label, as shown in Figure 1(c), the objective classifier p(y|x) is not directly parameterized by a discriminative model as usual.Instead, we define the classifier as the marginalization of a parser over the latent semantic tree, in which the parser could fulfill this purpose.Concretely, let T x (y) be the set of all semantic trees rooted y.Naturally, we have: where p(t|x) is a semantic tree parser that accepts a sentence and generates a semantic tree.We can conduct supervised learning when the classifier p(y|x) is obtained, where the parser p(t|x) is implicitly learned in this process.After training, the model can do the prediction via the induced classifier p(y|x), and generate the semantic tree to real the sentiment composition process of it.The very first issue before solving the summation in Equation ( 1) is to formalize the semantic tree.For simplicity, we can assume that the label of a constituent is determined immediately by its sub-constituents, regardless of the surrounding context.Therefore, the semantic tree is viewed as a derivation of a CFG that defines specific semantic labels and composition rules.Now, two challenges remain: 1) How to properly define the CFG behind the semantic tree? 2) How to model the parser p(t|x) and efficiently compute the classifier p(y|x)?We shall elaborate these two problems in Section 2.2 and Section 2.3, respectively.

Sentiment Composition Grammar
The proposed semantic tree is described by a context-free grammar G consisting a quadruple including the non-terminal symbol set N (semantic label set), the terminal symbol set V (word vocabulary), the composition rule set R and the root symbol set Y (P and N ).While V and Y are obvious, the design of semantic labels (N ) and composition rules (R) requires expert knowledge.Fortunately, previous works have concluded different types of compositions exhaustively (Polanyi and Zaenen, 2006;Moilanen and Pulman, 2007;Taboada et al., 2011), inspiring us to design 11 semantic labels and 62 composition rules.We call the proposed grammar as a sentiment composition grammar (SCG).

Semantic Labels
The defined 11 semantic labels include two types as follows: Sentimental labels Including negative N , positive P , neutral O. Functional labels Including negator D, irrealis blocker I, priority riser +, priority reducer −, high negative N + , high positive P + , low negative N − , low positive P − .
We shall explain these labels together with composition rules later.

Composition Rules
Formally, the composition rule is in the form of β → A (A ∈ N , β ∈ (N ∪V) * ), which determines the label of a constituent given its sub-constituents2 .We include three types of rules.The first one is binary rule in the form of BC → A (A, B, C ∈ N ).Binary rules are defined following common binary compositions, which mainly includes four types according to previous works and our observations.We now introduce each composition and its corresponding rules3 .
Polarity propagation Propagating the polarity: Negation Flipping the non-neutral polarity (P/N ) via a negator (D): Conflict Resolution Resolving the conflict of nonneutral polarity constituents (P/N ) by ranking their priorities based on priority modifiers (+/−).As a typical example, Figure 1 shows a contrastive conjunction (Socher et al., 2013) structure, which the first and the second half of the sentence have opposite polarities.The connector "but" is a priority riser (+) that rises the priority of the second half sentence, which dominates the entire sentence priority.Similarly, there also exist priority reducer (−) such as "although".Thus, rules related to this composition includes those for priority modification: and those for resolution: We don't allow the polarity with priority (N + /N − /P + /P − ) without a explicit modifier +/−, which a single word with non-neutral polarity can't have priority.Irrealis blocking Neutralizing the non-neutral polarity (P/N ) by an irrealis blocker (I): The blocker such as modal "would" or connector "if" can set up a context about possibility of some polarities not necessarily expressed by the author.As a result, a literal polarity is canceled.
The full binary rule list is shown in The second type is terminal-unary rule defining the legal preterminals of single words, which is in the form of ω → A (A ∈ N pret = {N, P, O, D, I, +, −}, ω ∈ V).As introduced, A can't be the polarity priority (N + /N − /P + /P − ).
We further define the preterminal-unary rule as the third type, including rules Those rules can only and must appear on the second layer of the semantic tree, which is designed to cancel the function of misrecognized function constituents, leading to better performance in our experiments.

Sentiment Composition Module
We now answer the second question: How to model the parser p(t|x) and compute the classifier p(y|x).We show that this process naturally lead to the sentiment composition module.

Semantic Tree Parser
First, we represent the semantic tree t of a sentence x = (x 0 , • • • , x T −1 ) by the set of anchored rules (Eisner, 2016) consisting of a rule and its location indices: where A ij (0 ≤ i < j < T ) is an anchored node suggesting a label A covering the constituent ranging from x i to x j−1 .A i is short for A i,i+1 which is an unary anchored node covering the word x i .Thus, resent the binary, preterminal-unary, and terminalunary anchroed rule, respectively.The semantic tree parser p(t|x) is defined by a Gibbs distribution on anchored rules in a tree (Finkel et al., 2008;Durrett and Klein, 2015): (8) where Z(x) is the log-partition function for normalization.ϕ(a) > 0 is the potential function of the anchored rule a defined in the exponential form exp(s(a)), where s(a) is the score to rate how comfortable it is for a to appear in the tree.Scores for different types of anchored rules are defined as the sum of a few subscores rating the comfortableness of corresponding substructures.
Here the scores of binary and pos-unary rules s rule (BC → A) and s rule (B → A) are scalar parameters.Other scores are modeled by neural networks: where • is the vector dot product.w • • and b • • are learning parameters.h l ij is the phrase representation of the constituent x ij in the l layer, which is computed by a text encoder m: where e i is the word embedding of x i .Note that we compute s label and s span using top layer phrase representations, but compute s rule using a lower layer one.This is because the recognition of the preterminal is easier than determining if this label is cancelled.Thus the simple phrase representation h ≤L ij is sufficient for the former, while the more "contextual" one h L ij is in favor by the latter.

Inducing the Classifier from the Parser
As shown in Equation ( 1), the classifier is induced by marginalizing over all the semantic trees of the input sentence, which can be efficient done by the inside algorithm.To illustrate this, we first let T x (A ij ) and T x (B ik C kj → A ij ) be sets of subtrees of sentence x that are covered by the anchored node A ij and rule B ik C kj → A ij , respectively.The inside algorithm defines the inside term α x (A ij ) = t∈Tx(A ij ) a∈t ϕ(a), which is the sum of the potentials of subtrees covered by A ij .The inside term is computed recursively in a bottom-up manner: ) where α x (A i ) is the initial value of this recursion.Obvious, the time complexity of the inside algorithm is O(|R|T 3 ).It can be shown that the inside term of the root anchored node α x (A 0T ), abbreviated as α x (A), equals to the unnormalized probability that the root of the semantic tree is y.Thus, we have: (s rule (BC → A) As seen, the logit in the softmax includes an extra score s SCM (A, x) as a complement to the regular one s label (A, x), where the former and the latter can be understood as the accordance of assigning the label A by means of sentiment composition and pattern recognition, respectively.Thus, we call s label and s SCM as the recognition module and the sentiment composition module, respectively.While the recognition module is only learned from the data, the sentiment composition module incorporates general and invariant human knowledge in the form of sentiment composition rules, which is more robust for domain adaptation, as we shall see in Section 4.1.
The last issue is that the proposed SCM is intractable for long documents due to the cube time complexity over length.So for a document, we first cut it into sentences, and then compute their individual logits.Document logits are aggregated by attention on those sentence logits, where attention weights are computed by sentence representations.

Training & Testing
Now we've obtained the induced classifier, we can apply supervised training by minimizing: This objective might be enough for the classification, but not for a plausible semantic tree explanation.Cases in which a semantic tree can reach a right root label with wrong preterminals and improper structure do exist.For example, if we choose BERT (Devlin et al., 2019) as the encoder, the method might assign non-neutral polarity to [CLS], and recognize any other tokens as neutral polarity, since [CLS] representation is usually treated as the sentence representation.An effective way to improve the plausibility is to learn the explanation via more explicit annotations (Strout et al., 2019;Zhong et al., 2019), even if those annotations are weak or incomplete.Therefore, we additional introduce two objectives to regularize the tree.
For the preterminal plausibility, we construct a lexicon to annotate the preterminal sequence of each sentence and conduct weakly-supervised learning on the annotation.As introduced, there are 7 preterminals in the proposed grammar, 3 sentimental and 5 functional.We utilize sentiwordnet (Baccianella et al., 2010) and stopwords in NLTK 5 and spaCy6 library to annotate non-neutral and neutral sentimental labels, respectively.For functional labels, we manually build a lexicon based on irrealis blockers and priority modifiers from Taboada et al. (2011), and negators in Loughran and Mc-Donald (2011).The functional lexicon is shown in Table 7 in Appendix B. Let o n be the annotated preterminal sequence of the sentence x n , and S n be the set containing the indices of all annotated words.Then, we optimize the following conditional log-likelihood based on the terminal-unary score function in Equation ( 10): For the structural plausibility, we annotate the syntactical tree for each sentence through Berkeley parser (Kitaev and Klein, 2018;Kitaev et al., 2019), which is a SOTA parser based on T5 (Raffel et al., 2020) and trained on the Penn Treebank (PTB) (Taylor et al., 2003).We convert the tree to the form of left-branching chomsky normal form (CNF) (Chomsky, 1963), and omit non-terminal labels to obtain the tree skeleton.Our goal is to make the semantic tree structure resemble the annotated PTB tree structure.Given the annotated skeleton k n of the sentence x n , we minimize the conditional likelihood: where c is a span in the skeleton k.As seen, r(k|x) is defined by a Gibbs distribution with span score functions in Equation ( 10).The normalization term Z ′ (x) is also computed via the inside algorithm similar to Equation ( 12).The final objective is the linear combination of the above three objectives7 : When the model is well-trained, it is able to not only predict the sentiment label but also generate the semantic tree as the explanation: The second argmax is to decode the best semantic tree with the maximal conditional probability, which is solved by the CKY algorithm (Kasami, 1965;Daniel, 1967).

Experiments
In this section, we conduct experiments to illustrate that the proposed SCM module is able to improve the accuracy performance.

Datasets
We adopt MR (Pang and Lee, 2005) and SST2 (Socher et al., 2013) in this experiment.MR contains 10662 movie reviews, half of which are positive/negative.Since it has no train/dev/test splits, we follow the convenience to conduct 10-fold cross validation.SST2 is built from SST by binarizing the 5-class sentiment label.Common settings of SST2 include SST2-S which only uses the sentence for training, and SST2-P which uses all labeled non-neutral phrases for training, of which the training size is 6920 and 98794, respectively.In both settings, there are 872/1821 sentences for validation/testing.

Implementation
We utilize BiLSTM (Hochreiter and Schmidhuber, 1997) and BERT (Devlin et al., 2019) (base version) as backbone encoders for modeling the constituent representations.For both models, we use the first layer representations to compute the terminal-unary scores.We use momentum-based gradient descent (Qian, 1999) (we set the momentum to be 0.9), along with cosine annealing learning rate schedule (Loshchilov and Hutter, 2017) to optimize our models.For detailed hyper-parameter settings, please check the configuration files in our publicly available repository.

Baselines
Compared models include sequential models and three types of tree models: sentiment tree models, untagged tree models and latent untagged tree models.Both tree models ultilize recursive neural networks (RvNNs) (Socher et al., 2011) for modeling phrases in the sentence following a tree structure.Sentiment tree models have the full sentiment tree supervision, and learned to predict labels of all nodes in the tree.By contrasts, tree structures for untagged tree models are obtained by an external parser, and only the root node label is available for training.Latent untagged tree models learn to generate the tree structure itself, which is implicit supervised by the task objectives.

Results
We report the accuracy of different models in Table 2, which we can find that: 1) Compared to the original sequential model, we can see that adding the proposed SCM steadily improves the classification accuracy for both BiLSTM and BERT encoder all the datsets and settings, directly reflecting the effectiveness of our method.2) Armed with the proposed SCM, the sequential BiLSTM achieves better or competitive performance with previous tree models on both datasets and settings.Specially, it outperforms each baselines on SST-2.This might suggest that the hierarchical RvNN is not necessarily the best way to model compositions, which a flat sequential model could do just as well.3) We  also admit that the performance improvement from our method is not that huge, which our BiLSTM model doesn't surpass all compared models on MR and SST2-P.However, since our motivation is interpretability, we believe that the performance is sufficient.

Sentiment Domain Adaptation
We conduct experiments in the cross-domain setting.We adopt Amazon in this experiment.Amazon is a widely-used domain adaption dataset collected by Blitzer et al. (2007).It contains review documents from the Amazon website in four domains: Books (B), DvDs (D), Electronics (E) and Kitchen & Housewares (K), where each domain contains 2000 labeled reviews.Following previous works, the model is trained on one domain and tested on the other three domains, yielding 12 crossdomain sentiment classification subtasks.For each subtask, we randomly sample 1600 examples in the source domain for training, and left the other 400 examples for validation.
We report the accuracy of different subtask in Table 3.As seen, compared to original sequential models, adding the proposed SCM improves the adaptation accuracy in most cases and on average as well, especially for BiLSTM which is trained from scratch.The improvement originates from the injected domain-invariant human knowledge in the proposed SCM, which helps the model to be less sensitive to the domain.The performance improvement of pretrained model BERT is not that significant because the pretraning process has already given the generalization ability to it.

Ablation Study
We conduct ablation study on SST2-S to study effects of different components including the grammar and two plausibility objectives.We report the accuracy and the unlabeled tree F1 of the generated semantic tree w.r.t.PTB trees generated by Berkeley parser for each model in Table 4.We find that the grammar doesn't work out alone when two plausibility objectives are absent, where the accuracy drops compared to the original encoder.We speculate this is due to lack of direct information of function labels, making it easier to mis-recognition on those labels.Such error would accumulated from bottom to up in the tree and pollute other sentences including the same constituent, causing the performance drop.
The preterminal plausibility objective L pos alleviates this issue effectively with an obvious performance improvement for both encoders.For the structure plausibility objective L str , though it makes the tree structure more syntactically meaningful with higher unlabeled tree F1, it doesn't necessarily guarantee the performance improvement.This suggests that the optimal tree structure might not exactly resemble PTB tree structure.On the contrary, the tree structure learned without L str , which has little similarity with PTB tree structure, is also suboptimal with mediocre accuracy.To study the optimal tree structure, we alter the balancing factor ω str and obtain models with different unlabeled tree F1 w.r.t.PTB trees and accuracy.Then, we visualize relation between these two metrices in Figure 3.We can see that accuracy roughly shows a trend of first increasing and then decreasing when the tree gets more syntactical meaning-   ful for both encoders (i.e., has higher unlabeled tree F1).This is contrary to that of Williams et al. (2018) which finds that the optimal tree structure of untagged tree methods RL-SPINN (Yogatama et al., 2017) and Gumbel-Tree (Choi et al., 2018) do not resemble PTB tree structure.This might because our method has a specific grammar with syntactical information restraining the tree structure, while untagged tree methods accommodate for any structure.

Effects of SCG
To show the effectiveness of the proposed SCG, we compare it with the glue grammar (Taboada et al., 2011) whose binary rules are very free and in the form BC → A (A, B, C ∈ {P, N, O}).Such rules act like the glue to connect adjacent constituents with any polarities.The results are shown in Table 5, which our proposed SCG is more effective with better accuracy compared to the glue grammar.We think this is because glue grammar rules are too free to carry specific sentiment composition knowledge, which is is helpless for the task.

Qualitative Study
We qualitatively show a few examples to show our method can handle compound sentiment composi-  tions in Figure 4.The first case is a sentence with two negative constituents joining by a coordinating conjunction, each of which has an irrealis blocking within.The second case is a sentence with negation under conflict resolution.For both cases, the prediction is not simple since the model is susceptible to the surface and literal meaning in the sentence, which might interfere the correct decision.Taking the sentiment composition explicitly, we can see that our method successfully judge the semantic role of different constituents, and finally compose plausible tree explanations.

Related Works
Sentiment composition is one of the key to sentiment analysis, which considers the semantic of a constituent from both recognition and composition views (Polanyi and Zaenen, 2006;Moilanen and Pulman, 2007).That is, it decomposes the classification of a sentence into a hierarchical tree structure explicitly showing how the polarity of the sentence come from the composition of its subconstituents.Early works are mainly based on manual rules and semantic lexicon that is constructed either manually (Wilson et al., 2005;Kennedy and Inkpen, 2006) or automatically (Dong et al., 2015;Toledo-Ronen et al., 2018).Nowadays, represented via different forms of tree, sentiment composition is often learned explicitly or implicitly in the endto-end learning manner of neural network models.
Common tree forms include untagged tree and sentiment tree, while the learning paradigm is also varied in literature.To be concrete, untagged tree can either be directly obtained from the external syntactic parser (Socher et al., 2012;Tai et al., 2015;Liu et al., 2017a,b;Kim et al., 2019), or serve as a latent variable learned implicitly (Yogatama et al., 2017;Maillard and Clark, 2018;Choi et al., 2018;Havrylov et al., 2019;Chowdhury and Caragea, 2021).Compared to the untagged one, sentiment tree offers more information about sentiment polarity of each constituent in the tree.As the most representative resource in this form, SST (Socher et al., 2013) formalizes sentiment composition as a parsing task, motivating lots of works to learn the tree supervisedly (Teng and Zhang, 2017;Zhang and Zhang, 2019;Zhang et al., 2019).Sentiment tree is also a popular explanation form for post-hoc interprebility since it can provide hierahical attribution scores (Chen et al., 2020;Zhang et al., 2020).While both existing forms are useful, they are suboptimal due to their in-ability to explicitly interpret sentiment composition, which our proposed semantic tree fills this gap.

Conclusions
In this paper, we present semantic tree to explicitly interpret sentiment compositions in sentiment classification.we carefully design a grammar under each compositions from the linguistic inspiration, and learn to extract semantic tree explanations without full annotations.Quantitative and qualitative results demonstrate that our method is effective and can generate plausible tree explanations.

Limitations & Ethics Statement
Our method is first limited by the proposed grammar that doesn't cover all the realistic cases.As shown in Table 1, there are still a few cases in the randomly sampled 100 examples that none of the defined rules can explain.Secondly, the time complexity of our method is the cube of the sentence length, limiting its direct applications on long documents.So we have to classify the document based on classification of individual sentences, which might be problematic since the sentiment of different sentences in the document may affect each other.
All the experiments in this paper are conducted on public available datasets, which has no data privacy concerns.Meanwhile, this paper doesn't involve human annotations, so there are no related ethical concerns.
the National Natural Science Foundation of China (No.61976211, No.62276264), and the Strategic Priority Research Program of Chinese Academy of Sciences (No.XDA27020100).This research was also supported by Meituan.

Figure 3 :
Figure 3: The relation between the accuracy and unlabeled tree F1 on SST2-S.

Figure 4 :
Figure 4: Semantic trees of compound sentiment compositions, generated by BiLSTM+SCM.We flatten the polarity propagation rules for compactness.

Table 1 :
Table 6 in Appendix A 4 .We also present examples of those The number of existences in the sampled 100 sentences in SST2 and MR.
compositions Figure2.Those compositions appears very commonly.To illustrate this, we randomly sample 100 examples in SST2 and MR and count occurrences of above compositions, where 97 and 98 examples in SST2 and MR can be explained by the above compositions.Thus, we believe our rules can cover most cases.

Table 2 :
Sentiment classification accuracy results.

Table 3 :
Domain adaptation results on Amazon.

Table 4 :
Ablation study results on SST2-S.CNF represents the CNF equivalent of the constituency tree generated by Berkeley parser.Its tree F1 is the upper limit of this value.

Table 5 :
Accuracy performances of different grammars on SST2-S.