FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric

Syntax is a fundamental component of language, yet few metrics have been employed to capture syntactic similarity or coherence at the utterance- and document-level. The existing standard document-level syntactic similarity metric is computationally expensive and performs inconsistently when faced with syntactically dissimilar documents. To address these challenges, we present FastKASSIM, a metric for utterance- and document-level syntactic similarity which pairs and averages the most similar constituency parse trees between a pair of documents based on tree kernels. FastKASSIM is more robust to syntactic dissimilarities and runs up to to 5.32 times faster than its predecessor over documents in the r/ChangeMyView corpus. FastKASSIM’s improvements allow us to examine hypotheses in two settings with large documents. We find that syntactically similar arguments on r/ChangeMyView tend to be more persuasive, and that syntax is predictive of authorship attribution in the Australian High Court Judgment corpus.


Introduction
Syntax, the form of language, plays a crucial role in all aspects of natural language and communication, whether explicitly or implicitly. In storytelling, writers often have their own styles rooted in different syntactic tendencies (Feng et al., 2012), allowing syntax to become indicators in prediction tasks such as gender (Sarawgi et al., 2011) andauthorship (Raghavan et al., 2010) attribution. Syntax also has social connotations in different cultures -for example, in Russia, different social and demographic groups tend to use different syntactic patterns (Bogdanova-Beglarian et al., 2016). Such examples makes syntactic consistency crucial to capture in tasks such as machine translation and dialogue generation so that social conventions are not lost. Yet, recent research focuses primarily * denotes equal contribution.
Utterance 1: When we hate, we always move away from the grace of God. When we become resentful and unforgiving, the world around us seems spiteful and meaningless. on evaluating similarity and coherence in terms of dimensions like semantics, the meaning behind language, (with approaches such as BERT embeddings (Reimers and Gurevych, 2019;Zhang et al., 2019b)), or lexical overlap (e.g., BLEU (Papineni et al., 2002)) -even in work which uses syntax as an input to improve translation quality (Zhang et al., 2019a). A lack of work on syntax can be partially attributed to the absence of a practical, efficient metric that specifically compares syntax at the utterance level. The current standard metric is CAS-SIM (Boghrati et al., 2016(Boghrati et al., , 2018, but CASSIM uses a computationally expensive distance metric and can yield inconsistencies when comparing syntactically dissimilar documents (e.g., Figure 1). To address this issue, we introduce FastKASSIM, a Fast Tree Kernel-bAsed Syntactic SIMilarity Metric 1 , an improved metric for syntactic similarity at the utterance-and document-level.
Like its predecessor, CASSIM, FastKASSIM computes the constituency parse tree of each sentence in a pair of documents, and the similarity between each pair of parse trees. But, while CASSIM used Edit Distance (Pawlik and Augsten, 2011;Zhang and Shasha, 1989) for similarity, we propose using a Label-based Tree Kernel (henceforth LTK), our more syntactically thorough implementation of the Fast Tree Kernel (Moschitti, 2006). We evaluate FastKASSIM against CASSIM and Linguistic Style Matching (henceforth LSM;Niederhoffer and Pennebaker (2002); Ireland and Pennebaker (2010)). We find that FastKASSIM is more robust in cases of dissimilarities between documents and is generally more agreeable with human perception of differences in syntax. Additionally, the runtime of LTK is much faster than that of Edit Distance; it scales linearly with the number of node pairs with the same label in a pair of parse trees. We empirically show large improvements in runtime with FastKASSIM.
Previously, it was difficult to observe the role of syntax in behavioral phenomena at scale due to runtime constraints. Here, we contribute a study of hypotheses in two sets of applications. First, we examine the relationship between the persuasiveness of online arguments and syntactic similarity, and second, we observe the viability of syntax as an indicator in authorship attribution. FastKAS-SIM unlocks potential for evaluatory use in more contexts where it is important to preserve syntactic consistency and writing style, e.g., style transfer, machine translation, and story generation.

Related Work
A few early studies focused solely on capturing syntactic structures. Sagae and Gordon (2009) sought to cluster words by syntactic similarity. In order to establish a distance metric, they computed the cosine distance between vector representations of their unique constituency parses. Other approaches have used LIWC (Tausczik and Pennebaker, 2010). Danescu-Niculescu-Mizil et al. (2011) took a probabilistic approach to measure symmetry and influence of linguistic style.
Other early analytical work found that people will adjust their syntax to match dialog systems' syntactic (Stoyanchev and Stent, 2009) and lexical (Stoyanchev and Stent, 2009;Hoshida et al., 2017) choices. Reitter et al. (2006) found that individual syntactic productions would repeat at low "distances" across utterances in both task-oriented and "spontaneous" dialog. For instance, Reitter and Moore (2007) found that syntactic priming was predictive of success on the HCRC Map Task (Anderson et al., 1991). Baker et al. (2021) similarly discussed the use of syntactic similarity and overall linguistic style synchrony as an indicator of trust and cohesion in teamwork settings, and Boncz (2019) used syntactic similarity in modeling cognitive alignment.
Syntactic features have been shown to improve prediction performance in downstream tasks, e.g., authorship attribution (Posadas-Duran et al., 2014;Raghavan et al., 2010;Zhang et al., 2018) and gender attribution (Sarawgi et al., 2011). In each case, the studies found significant performance gains from models that included syntax features. Despite interest in syntax and clear improvements in prediction performance, the vast majority of recent work primarily focuses on semantic, or even lexical similarities/differences. This ranges from traditional methods such as TFIDF or Jaccard similarity to modern approaches including BERT embeddings (Devlin et al., 2019;Reimers and Gurevych, 2019;Zhang et al., 2019b) and AMR kernels (Opitz et al., 2021). Some approaches use syntactic features specific to certain domains such as Twitter (Alnajran, 2019;Little et al., 2020) or web documents (Broder et al., 1997;Pereira and Ziviani, 2003). Other metrics include "syntactic elements" which take on various forms of parts-of-speech aggregation (Alnajran, 2019;Pakray et al., 2011).
A syntactic similarity metric should appropriately consider differences in syntactic structure at the word-, utterance-, and document-level, as opposed to aggregating parts-of-speech or relying on domain-specific features. To our knowledge, CAS-SIM is the only metric to do this and has been proposed as a solution in applications ranging from measuring communicative alignment (Boncz, 2019;Baker et al., 2021) to evaluating stylistic creativity in language learning (Kokkola and Rydström, 2022) to clustering text (Boghrati et al., 2017). However, CASSIM relies on the expensive Edit Distance metric, and occasionally assigns high similarity scores to documents that appear syntactically dissimilar. An improved syntactic similarity metric would afford new opportunities, from creating novel syntax feature vectors for classification tasks (e.g. authorship and gender attribution), to measuring syntactic coherence in machine translation.  (Boghrati et al., 2018) was the first metric to compute syntactic similarity at the documentlevel. Their original algorithm uses the Stanford Parser (Klein and Manning, 2003;Chen and Manning, 2014) to compute the parse tree for each sentence in a pair of documents, before computing the Edit Distance (Wagner and Fischer, 1974) between each parse tree pairing. Then, they construct a bipartite graph and use the Hungarian Algorithm for minimum cost assignment (Kuhn, 1955) to pair each tree in one document to the lowest distance tree in the second document. They finally average the Edit Distances of the minimal cost pairings. When there are different numbers of sentences, the number of assignments will correspond to the number of sentences in the document with fewer sentences. Each of that document's sentences will get paired with the most similar sentence in the second document, and the least similar sentences in the second document will remain unpaired. The final Edit Distance between a pair of parse trees P 1 , P 2 is normalized as EditDistance Size(P 1 )+Size(P 2 )−2 , where Size(P ) is the number of nodes in P .
An important advantage of CASSIM is that it is generalizable to any corpus; it does not represent syntax using platform-specific features like Alnajran (2019); Little et al. (2020). However, the cost of exhaustively using a metric such as Edit Distance is rather penalizing, as its implementations range in asymptotic time complexity from Θ(mn) (Wagner and Fischer, 1974) to O(s × min(m, n)) (Ukkonen, 1985), where m and n are the string lengths, and s is the maximal Edit Distance.

The FastKASSIM Algorithm
In large multi-sentence documents, repeated Edit Distance becomes the most expensive component of CASSIM. Thus, we propose FastKASSIM, which avoids the expensive Edit Distance computation by using Tree Kernels (Moschitti, 2006). Tree Kernels can greatly reduce time complexity by caching between recursive subcalls. The Fast Tree Kernel algorithm Moschitti (2006) runs in linear time on average with respect to parse tree sizes.
We propose FastKASSIM, which replaces CAS-SIM's Edit Distance with a new normalized Tree Kernel. Figure 2 provides a high-level overview of FastKASSIM, which is formally described in Algorithm 1. However, the Fast Tree Kernel does not allow for the case in which two parse tree nodes have matching labels but different productions. We thus also introduce the Label-based Tree Kernel (LTK) 2 , which compares the labels at each node in a pair of subtrees or subset trees 3 . This also  Figure 2: A high-level illustration of FastKASSIM computation. The parse trees of all sentence pairs between D 1 , D 2 are computed using LTK. The Hungarian algorithm is used to pair together the most similar parse trees of each sentence in the two documents by the "maximal cost" (i.e., the largest tree kernels). The score is normalized by summing the paired kernel values then dividing by the number of sentences in the document with more sentences. D 2 S 3 is unpaired because D 1 S 1 , D 2 S 1 and D 1 S 2 , D 2 S 2 are paired, and D 2 has more sentences than D 1 . more closely follows (Collins and Duffy, 2002), which proposes comparing the actual subset trees rooted at two nodes in each parse tree, rather than the production. Figure 3 depicts the LTK algorithm computing the number of shared subset trees in a pair of parse trees. More formally, as described in lines 7-11 of Algorithm 1, LTK accumulates the value of ∆ lb (Algorithm 2), which is the number of common fragments rooted in a pair of parse tree nodes n 1 , n 2 . We follow Moschitti (2006) by normalizing LT K(T 1, T 2) as . This normalized tree kernel is not biased towards tree shape, in contrast to CASSIM's normalized Edit Distance ( EditDistance Size(P 1 )+Size(P 2 )−2 ). Under CAS-SIM's normalization, if two sentences resulted in the same parse tree size despite being composed of entirely different labels, the normalized Edit Distance approaches 0.50 (the expression approximates Size(P 1 ) 2×Size(P 1 )−2 ). In other words, according to CASSIM, sentences with the same shape but different parts-of-speech should be neither similar nor dissimilar. We further highlight possible examples of this bias by visualizing parse trees in Appendix D. Ultimately, our normalized LTK results in significant runtime improvements over Edit Distance, as we show in Section 3.3 and derive in Appendix A, and agrees strongly with human perception, as we show in Section 4.
Like the original CASSIM algorithm, our highlevel algorithm allows for flexibility in the choice descendants, whereas a subset tree does not require its leaves to be terminal. of which parser to use, allowing for future improvements in runtime and correctness as research in parsing progresses. FastKASSIM similarly allows for flexibility in the implementation of tree kernels. Our implementation will be publicly released upon acceptance. In order to directly compare FastKAS-SIM and CASSIM, we default to using the Stanford Parser (Chen and Manning, 2014) and LTK, our aforementioned modified approach to the Fast Tree Kernel (Moschitti, 2006). 4

Overall Metric Runtime Comparison
The largest difference in the runtime of CASSIM and FastKASSIM is that CASSIM uses a normalized Edit Distance to evaluate parse tree similarity, while FastKASSIM uses a normalized tree kernel.
LTK recursively computes ∆ lb across all n 1 , n 2 pairs in parse trees P 1 , P 2 . But, importantly, all comparisons are cached to avoid repetition. This results in LTK having an asymptotic runtime complexity of O(S 12 ), where S 12 is the total number of pairs of nodes in a pair of parse trees P 1 , P 2 that have the same label. We prove this runtime in Appendix A. This is a large improvement over Edit Distance's runtime complexity of O(s × min(m, n)). We confirmed that these asymptotic improvements apply to real-world uses cases by comparing how Edit Distance and LTK scale with the product of parse tree sizes in Figure 6 of the Appendix, finding that LTK scales sublinearly while Edit Distance scales superlinearly.  Figure 3: Overview of the Label-based Tree Kernel. The parse trees of a pair of sentences are computed, along with the number of common fragments rooted in each pair of parse tree nodes (i.e., number of shared subset trees). This is normalized by dividing by the square root of the product of the number of subset trees in each parse tree.
While Figure 6 indicates that LTK can be up to an order of magnitude faster than Edit Distance, the largest bottleneck in overall time is still the time to compute each parse tree. Thus, in Figure 4, we investigated the difference in "end-to-end" runtime between FastKASSIM and CASSIM without precomputing the parse trees.
In this experiment, the ChangeMyView dataset (henceforth CMV; Tan et al. (2016)) is used, providing a corpus of unstructured text with large document sizes, to evaluate the promises of FastKAS-SIM and CASSIM for their abilities to process entire documents. First, we sample entire document pairs and record the time it takes to compute the syntactic similarity of each pair. Each pair is randomly sampled from the 18,363 posts in the CMV training set. Then, we exhaustively paired documents based on the product of their document sizes, providing an approximation of the number of comparisons between parse trees. The document length for each CMV root posts has high variance, so document length products are grouped into bins. For each bin, we randomly sample 60 document pairs and report the average runtime. Figure 4 shows that FastKASSIM scales well in runtime as the product of document lengths increases. For instance, when syntactic similarity between documents of lengths 300 words and 310 words were compared (product of 93, 000), CASSIM needed on average 113.3 seconds while FastKASSIM took only 21.3 seconds on average. Given these drastic improvements in time complexity, it is now more feasible to compute syntactic similarity at the document level for large corpora.

Evaluating FastKASSIM
In this section, we first demonstrate FastKAS-SIM's overall ability to differentiate between simi- lar and dissimilar documents. Then, we correlate its scores with CASSIM and LSM. Finally, we discuss FastKASSIM's advantages by explaining discrepancies in scoring.

Discriminating Between Syntactically
Similar and Dissimilar Documents Boghrati et al. (2018) validated CASSIM by comparing whether it was consistent with human perception of syntactic similarity. The authors asked Mechanical Turkers to write syntactically similar sentences given a sentence prompt. This resulted in a dataset of 472 English documents from 118 anonymous human annotators, and the authors found that CASSIM was able to "identify syntactically similar documents." Following Boghrati et al. (2018), we computed the syntactic similarity between each pairing of sentences generated both within the same prompt and between different prompts. Each individual prompt is structurally quite different (Table 1). By construction, documents resulting from the same prompt should be syntactically similar, whereas documents resulting from different prompts should Prompts 1. The two most important days in your life are the day you are born and the day you find out why. The nice thing about being a celebrity is that you bore people and they think it's their fault. 2. I am enough of an artist to draw freely upon my imagination. Imagination is more important than knowledge. Knowledge is limited. Imagination encircles the world. 3. When we love, we always strive to become better than we are. When we strive to become better than we are, everything around us becomes better too. 4. What is the point of being alive if you don't at least try to do something remarkable? be dissimilar. As per their work, we fit a maximal structure linear mixed effect model with an indicator for whether the sentence corresponded to the same prompt or a different prompt as a fixed effect and the document ID as a random effect against standardized syntactic similarity 5 .
We computed ANOVA of this full model against the reduced model, which drops the comparison type indicator as a fixed effect. The ANOVA χ 2 test on the impact of comparison type yields statistically significant differences in the distribution of FastKASSIM (χ 2 = 1438.9, p < 0.001), CASSIM (χ 2 = 331.84, p < 0.001), and LSM (χ 2 = 201.85, p < 0.001) scores between syntactically similar and dissimilar documents. FastKAS-SIM results in the largest effect size, 1438.9, indicating it creates the largest differences in distribution.

Correlating Syntax Metrics
LSM 6 is a metric computing similarities from function word categories. This has ties to matching syntax, as those matching function words correspond to specific parts-of-speech. Moreover, LSM is a widely accepted metric for synchrony and correspondence of general linguistic style in documents (e.g. Chartrand et al. (2005); Ludwig et al. (2013)). We examine the actual similarity scores calculated in the previous section on the crowdsourced document similarity corpus collected by Boghrati et al. (2018) using each of LSM, FastKAS-SIM, and CASSIM. In Figure 5, we see that there is a moderately strong correlation between FastKAS-SIM and LSM (R = 0.5, p < 0.001). This indicates that FastKASSIM is able to detect matches in key parts-of-speech. We would not expect to 5 We use z-score standardization, x−µ σ . 6 We compute LSM using an implementation publicly available at https://github.com/miserman/lingmatch. see a greatly higher correlation, because LSM is a measure of function words rather than a holistic measure of syntax. On the other hand, while we see a statistically significant correlation between CASSIM and LSM, its correlation coefficient is much lower (R = 0.11, p < 0.001), indicating a smaller connection between CASSIM representations and function words. This is likely due to the biased Edit Distance normalization mentioned in Section 3.1. Moreover, we actually find an overall negative correlation between FastKASSIM and CASSIM (R = −0.33, p < 0.001) with a seemingly bipartite relationship. There is an apparent disagreement over documents that FastKASSIM deems dissimilar, with agreement over documents that FastKASSIM deems similar. Figure 5 indicates that there are several regions of disagreement (vertical clustering). In one region of Figure 5a, FastKASSIM assigned low scores (less than 0.4) despite LSM ranging from 0.258 to 0.842. Recall that in this corpus, every utterance resulting from the same prompt was perceived as syntactically similar, and utterances from two different prompts were perceived as syntactically dissimilar. We find that in 248 out of 249 cases where FastKASSIM assigned a score below 0.4 yet LSM assigned a high score (above 0.6), the two documents being compared came from different prompts. This implies that FastKASSIM indeed is discriminating between different syntactic structures, whereas LSM may be picking up on similarities other than syntax, as expected. Figure 5b does not indicate any obvious relationship between CASSIM and LSM. More surprisingly, Figure 5c shows that for document pairings that FastKASSIM deems syntactically dissimilar (values of less than 0.5), there is a very strong negative correlation between FastKASSIM and CAS-SIM (R = −0.783, p < 0.001). Visually, there is a cluster of pairings where FastKASSIM assigns a value less than 0.4 but CASSIM assigns a value larger than 0.75. We find that in 677 of the 678 pairings in this cluster, the documents originate from different prompts, indicating that the documents are syntactically dissimilar.

Discrepancies Across Syntax Metrics
We evaluate these discrepancies in Table 2 in terms of each metric's ability to correctly identify documents originating from the same prompt as syntactically similar and those from differ- ent prompts as syntatically dissimilar using accuracy, recall, and precision 7 . In addition to CASSIM and LSM, we include an evaluation of BERTScore (Zhang et al., 2019b) and Sentence-BERT (Reimers and Gurevych, 2019). While they are not syntax metrics, they are strong embeddingbased metric which may account for syntactic forms.
As similarity scores lie between 0.0 and 1.0, we use 0.5 as a boundary between similar and dissimilar documents. We reason that in unknown contexts, 0.5 is neutral. We also compare against quantile transformations of each baseline metric, which map each metric's scores to a uniform distribution (note this is an unfair advantage, since in real-time deployment, one cannot observe the entire distribution of values), due to their apparent biases in Figure 5. In Table 2, we find that FastKAS-SIM holistically outperforms both CASSIM and LSM, with the exception of similar document recall and dissimilar document precision. In these cases, CASSIM achieves perfect precision and recall because it only classified one document pair as dissimilar. BERTScore similarly yields a small range in similarity scores on this corpus. After undergoing a quantile transformation, BERTScore's sensitivity to syntactic differences is magnified, and it performs well in the aforementioned categories but underperforms FastKASSIM in accuracy, similarity precision and dissimilarity recall.
As indicated by its low Dissimilar Document Recall in Table 2, we found CASSIM frequently assigned high similarity scores to syntactically different documents. This likely comes from the bias in its normalized Edit Distance as discussed in Section 3.1, and we show the significant improvements 7 Exact expressions provided in Appendix B.

Metric
Acc  achieved by FastKASSIM.
Overall, our results indicate that with respect to human intuition of syntax, FastKASSIM is more robust than CASSIM, LSM and BERTScore. In Appendix D, we visualize comparisons of several pairs of parse trees along with their FastKASSIM and CASSIM scores.

Applications
FastKASSIM is a more accurate and efficient syntactic similarity metric than the current state-of-theart, opening the possibility for investigating new hypotheses in data-heavy fields with large corpora. Existing applications use syntax metrics for classification (e.g. Posadas-Duran et al. (2014)) as well as analytically to measure hypotheses (e.g. Kaster et al. (2021)). Here, we use syntax as a linguistic style indicator in authorship attribution, and measure syntactic similarity to study communication accommodation in persuasive arguments.

Persuasiveness of Syntax Accommodation
Early work in communication accommodation theory found that matching communication styles can create a sense of familiarity, which improves social and conversational outcomes (e.g. Curhan and Pentland (2007); Giles (2016)). While most existing work has focused on hypotheses at the wordlevel (Tan et al., 2016), we hypothesize that CMV arguments that are more syntactically similar to opinions may be more persuasive as well. 8 On CMV, users write an original post describing an opinion and allow "challengers" to present arguments attempting to change their opinion. Original posters (OP) indicate whether their opinions have been changed by assigning a "delta," which we can use an indicator of successful persuasion. We computed the syntactic similarity between a challenger's initial argument and an original poster's (OP) original opinion. This choice is made because the OP presents their full opinion in their original post, and a challenger typically presents their central argument in their initial challenge (Tan et al., 2016). While many CMV studies predict persuasion outcomes, prediction tasks do not reveal the actual bidirectional relationship between syntactic similarity and persuasion. We use FastKASSIM to analyze this relationship.
We find that arguments which eventually lead to deltas (µ = 0.307) tend to be more syntactically similar to original opinions than unsuccessful (µ = 0.263) arguments (t = 19.016; p < 0.001). However, this finding does not imply on its own that syntactically similar arguments are more persuasive. Thus, we also examined the converse by computing the persuasion rates (proportion of threads receiving deltas) of the most syntactically similar and dissimilar arguments. We computed the syntactic similarity of each pairing of initial arguments and original opinions and grouped them into the top and bottom 33% of syntactic similarity. This resulted in a minimum syntactic similarity value of 0.341 for the top 33% (µ = 0.453) and maximum of 0.171 for the bottom 33% (µ = 0.096). We found that threads grouped in the bottom 33% of syntactic similarity only had a persuasion rate of 6.377%, while the persuasion rate for threads grouped in the top 33% was nearly twice that, 12.347% (t = 19.135; p < 0.001). These findings support the hypothesis 8 Appendix C includes full details on CMV and preprocessing.

Features
Acc.  . that similar syntactic patterns play a role in persuasion -may be an indication of stylistic familiarity for the OP. To capture semantics, one setting used normalized Bag of Words with Support Vector Machines and the other used a state-of-the-art fine-tuned RoBERTa (Liu et al., 2019) model 10 . We augmented both semantic settings with a syntactic similarity feature vector -for each classification instance, we randomly sampled 25 posts from the training set and computed the FastKASSIM syntactic similarity between judgments written by Rich and McTiernan, respectively. The syntactic similarity features consisted of the minimum, maximum, mean, and standard deviation of these comparisons. We evaluated our classifier on a 10% withheld testing set 11 . Table 3 shows that adding syntactic features to both semantic models results in gains in both accuracy and weighted F1. This is even the case when using RoBERTa; we achieve the strongest performance using RoBERTa with a weighted sum between textual and syntactic features, fine-tuned using modules from the frameworks in Gu and Budhkar (2021); Wolf et al. (2020). Syntactic similarity with reference documents may provide a strong indicator of writing style.

Conclusion
We have introduced FastKASSIM, which has runtime improvements that scale significantly with document sizes and achieves better agreement with human perception of syntactic differences compared to standard syntax metrics. These improvements are possible due to our Label-based Tree Kernel, which has an improved asymptotic runtime complexity and a corrected normalization. FastKASSIM also allowed us to verify hypotheses regarding the importance of syntax both in authorship attribution and social dynamics such as persuasion. These findings motivate further applications of syntax.

Limitations
Our work relies on a couple assumptions. Our main corpus for evaluation is the crowdsourced and human-annotated dataset from Boghrati et al. (2018). As a result, our claim to better represent human perception of syntax relies on the assumption that their annotators correctly filter out responses which are not actually syntactically similar to each prompt. They had an acceptable Cohen's Kappa of 0.53. Additionally, we only use corpora that are in English. Future work should look towards applying our general approach to other languages.
In our evaluation of FastKASSIM against LSM and CASSIM, we also evaluate its ability to correctly identify statements created from the same prompt as similar and statements created from different prompts as dissimilar (Table 2). In this evaluation, we assume that 0.50 is an acceptable threshold for syntactic similarity and dissimilarity, because without any contextual information, one would assume that there are an equal amount of similar and dissimilar documents. Despite this, we still performed quantile adjustments for each comparison metric, uniformly distributing the scores between 0.0 and 1.0. This gives is an unfair advantage for the comparison metrics (i.e., CASSIM, LSM, and BERTScore), since "in-the-wild" it is impossible to obtain the eventual distribution of scores. Future work may consider methods to rebalance each of these scores, including conducting human evaluation to evaluate whether 0.50 is an acceptable threshold for syntactic similarity both before and after each metric undergoes an adjustment to the uniform distribution.
FastKASSIM is a metric for syntactic similarity between a pair of utterances or documents. However, similarity is only one dimension of syntax, which removes some granularity -for instance, syntactic similarity cannot explain which specific productions are shared. Similarity metrics like FastKASSIM instead afford opportunities in a variety of other applications, such as syntactic coherence in language generation and verifying computational social science hypotheses.
Additionally, parsing is still a significant bottleneck in runtime. Future work may wish to consider ways to mitigate the cost of parsing. One may also consider using sequential modeling to generate syntactic parse trees, or to directly model the output of FastKASSIM.

Ethical Considerations
Our study makes use of three datasets. First is the set of prompts collected in Boghrati et al. (2018), which involved anonymous participants creating fictional statements, so there is no personal information involved. Second is the publicly available r/ChangeMyView dataset collected by Tan et al. (2016), which consists of statements made by users behind typically anonymous aliases. Lastly is the publicly available WikiQA corpus (Yang et al., 2015), which does not contain identifying information.
In our r/ChangeMyView application studying the relationship between syntactic similarity and persuasion, we make the assumption that r/ChangeMyView is a community representative of online arguments. However, partially due to its anonymity, it is unknown whether r/ChangeMyView is a representative sample with diversity in location, educational background, socioeconomic status, ethnicity, and many other im-portant factors. An ideal study should be able to control for proxies for individual traits in order to isolate the impact of syntax itself.
Generally, while most algorithms are not inherently unethical, there is often potential for abuse in their applications. The individual computations in the FastKASSIM algorithm do not have any negative implications, but it is possible to use syntactic similarity for unethical downstream tasks. For instance, because syntax is an important aspect of writing style, it is possible that users may try to adversarially uncover an anonymous author's identity. We do not condone the use of FastKASSIM for any unlawful or morally unjust activities. We do not propose any new tasks that would introduce unethical activity. Reihane Boghrati, Kate M Johnson, and Morteza Dehghani. 2017. Generalized representation of syntactic structures. In CogSci.  The Label-based Tree Kernel recursively computes Delta lb across all n 1 , n 2 pairs in parse trees P 1 , P 2 . The time complexity of ∆ lb (n 1 , n 2 ) has a ceiling of O(L h 1 × L h 2 ), where L h i is the number of nodes at height h for a tree rooted at n i , and h is the minimum height between the two trees. In the worst case scenario, ∆ lb of a fully uncached pair n i , n j results in recursive calls at every depth level. ∆ lb is computed for each node pair, so the ceiling of the tree kernel runtime is O(N M ), where N, M are the total number of nodes in P 1 and P 2 , respectively. 12 O(N M ) is a ceiling assuming the worst-case, where the labels are the same at each comparison, requiring full recursion. Let us consider there to be k shared labels in a pair of parse trees.

A Formalizing FastKASSIM
In each parse tree P 1 , P 2 , there will be 1 possible pairs of connected components and computing LTK. Then, for k shared labels, the worst-case runtime (i.e. the connected 12 Note that this is only possible due to ∆ lb caching the repetitive computations when iterating over node pairs. components do not form shared subtrees), we have equation 1: i are the total number of nodes that have label i in P 1 , P 2 respectively. However, recall from Algorithm 2 that LTK only iterates through pairs that share the same label; it does not matter if the connected components themselves are intertwined. Then, further simplifying this term we have equation 2: where S 12 is simply the total number of pairs in P 1 , P 2 that have the same label. When all nodes have the same label, S 12 = N M , consistent with the observed runtime ceiling. Empirically, we see that the expectation of S 12 is much smaller than N M , as seen by the sublinear time scaling in Fig We first examine the relationship between LTK and N M , the number of possible node pairings, using the WikiQA corpus (Yang et al., 2015), a public dataset containing annotated question and answer pairs written in English. This dataset is chosen in order to compare FastKASSIM and CASSIM on well-structured text, and because of the ability to extract clean sentences of various length, which is crucial in determining N M . The experiment utilized 20,347 answer sentences from the WikiQA training set. We compute the parse trees prior to computing the runtimes of Edit Distance and LTK.  Statistics of parse trees from the WikiQA-train corpus are shown in Figure 7. As the corpus is rather large, we consider sentences with fewer than 30 nodes, which resulted in a total of 16,591,680 possible pairings. Then, pairings are grouped by the product of their nodes N × M . For each value of N × M , we track the cost of computing both the Edit Distance and LTK for all pairings if there are less than 10 pairs, or randomly sample 10 pairs if there are more. The average runtime for both Edit Distance and LTK for each value of N × M are shown in Figure 6.

B Metrics for Evaluating FastKASSIM, CASSIM, and LSM
In Section 4.3 and Table 2, we evaluated FastKAS-SIM, CASSIM, and LSM in terms of Similarity Accuracy, Similar Document Recall, Similar Document Precision, Dissimilar Document Recall, and Dissimilar Document Precision. These all follow the standard formulas for accuracy, recall, and precision. Adapted to our similarity context: Similarity accuracy is the sum of the number of same prompt pairs receiving a score greater than 0.50 and the number of different prompt pairs receiving a score lower than 0.50 divided by the total number of pairings.
Similar document recall is the number same prompt pairs receiving a score greater than 0.50 divided by the total number of pairings originating from the same prompt.
Similar document precision is the number of same prompt pairs receiving a score greater than 0.50 divided by the total number of pairings receiv-ing a score greater than 0.50. Dissimilar document recall is the number different prompt pairs receiving a score less than 0.50 divided by the total number of pairings originating from different prompt.
Dissimilar document precision is the number of different prompt pairs receiving a score less than 0.50 divided by the total number of pairings receiving a score less than 0.50.

C Persuasiveness of Syntactic Similarity:
Additional Context

C.1 CMV Background
We investigate the role of syntax in persuasive arguments in the r/ChangeMyView 13 community (CMV) on Reddit. CMV users come in "good faith" that they are open to changing their view on a controversial topic. They write an original post describing an opinion and allow "challengers" to comment on their post and attempt to change their opinion. If their opinion is changed, the original poster (OP) will indicate this by assigning a "delta" (by typing either "!delta" or ∆ in response to the persuasive comment). An OP may choose to present a rebuttal to a challenger, openly disagree with a challenger, or simply ignore a challenger (e.g., Figure 8). All Reddit users use anonymous aliases, unless they explicitly disclose their identity. Earlier work found positive relationships between behavioral mimicry (mirroring behaviors) and in-person negotiations (Curhan and Pentland, 2007;Maddux et al., 2008). Yet, Healey et al. (2014) found that in general spoken conversations, peoples' syntactic patterns diverged from each other. We thus investigate the hypothesis that as a challenger on CMV continues to engage in an argument with an OP, their syntactic communication styles may begin to converge in order to "optimize for social differences." Additionally, we hypothesize that challengers who utilize similar syntactic patterns, whether intentionally or not, may be more persuasive.

C.2 Dataset
We use the CMV dataset consisting of 18,363 posts and 1,114,533 comments written in English and collected by Tan et al. (2016). As in Tan et al. (2016), we examine discussion trees with at least 10 replies from challengers and at least one OP Figure 8: An example of a thread on CMV. The OP presents a set of arguments defending their opinion (orange, top), inviting challengers to contest their opinion (blue). The OP acknowledged their opinion has been changed by assigning a delta (orange, bottom).
reply, in order to focus on discussions with "nontrivial" amounts of engagement. We also filter out posts which receive more than 10,000 comments in order to reduce noise from "outsiders" in cases where a post goes viral. When a challenger comments on an original post, it starts a "thread," with the original post (OP's opinion) taking on the root, index 0, and the challenger's comment taking on index 1. Each additional comment made in reply extends the thread. We are interested in syntactic accommodation, so we only consider threads that consist of conversations between the OP and a single challenger to eliminate confounders. These preprocessing steps results in a final dataset consisting of 15, 986 posts.

C.3 Dicusssion & Implications
In Section 5.1, we found two very statistically significant relationships between syntactic similarity and persuasive arguments. First, arguments that eventually lead to deltas tend to be more syntactically similar to original opinions compared to arguments that do not. Second, the arguments that are the most syntactically similar to original opinions actually were nearly twice as likely to receive deltas than the least syntactically similar arguments. Altogether, this may imply that syntactically similar arguments are more persuasive. This idea is supported by the rich body of work suggesting that similarity and communicative familiarity leads to improved social and conversational outcomes (Curhan and Pentland, 2007;Giles, 2016;Kaptein et al., 2014;Maddux et al., 2008;Wetzel and Insko, 1982).

D Parse Tree Examples
We visualize the constituency parse trees of several sentences taken from the corpus collected in Boghrati et al. (2018) using the online interface of the Berkeley Neural Parser 14 , which uses the parser described in Kitaev et al. (2019).
We first compare the parse trees of the examples provided in Figure 1. Figure 9 compares the parse trees of the two similar documents shown in Figure 1. The first document is composed of one sentence -"I like swimming because it is cool." and the second document is also composed of one sentence -"I love running because it is fun." CAS-SIM assigned a score of 0.962, and FastKASSIM assigned a score of 0.928. The structure and composition of these two documents are nearly identical; the only difference is the production associated with the words "running" and "swimming." Figure 10 is a visualization of the parse trees of the two dissimilar documents shown in Figure 1. The first document is composed of two sentences: "When we hate, we always move away from the grace of God. When we become resentful and unforgiving, the world around us seems spiteful and meaningless." The second document is composed of one sentence: "How can you be skiing if you are already swimming?" Beyond the differing number of sentences, the sentences in the first document individually appear structurally dissimilar compared to the sentence in the second document. FastKAS-SIM assigned a low score -0.219, whereas CAS-SIM assigned a high score -0.838. Figure 11 compares the parse trees of the two single-sentence documents "How can you be skiing if you already swimming?" and "Knowledge is important to succeed." As is clear from Figure 11, the two sentences are structurally and compositionally quite different. FastKASSIM assigned a score of 0.439, whereas CASSIM assigned a score of 0.679. Figure 12 compares the parse trees of two separate documents. The first document is composed of two sentences: "When we dream, we often search for deeper meaning. When we search for deeper meaning, other things become more nuanced too." The second document is composed of two sentences as well: "When we concentrate, we try to do better on a task. When we strive to do better, we end up doing better too." The structures of the two documents appear rather similar, but there do appear to be some differences in composition (i.e., in terms of the constituent parts-of-speech). FastKASSIM assigned a score of 0.656, and CAS-SIM assigned a score of 0.837. While both scores are relatively high, FastKASSIM may be more penalizing towards these types of differences. Figure 13 compares the parse trees of two separate documents. The first document is composed of four sentences: "I am old enough to draw freely upon my experience. Experience is more important than luck. Luck can turn. Experience lasts a lifetime." The second document is composed of one sentence: "Being loving makes you become better." Holistically, the structures of the two documents are quite different. Beyond the differing number of sentences in each document, there are also not any individual sentences between the two documents that appear particularly syntactically similar. FastKASSIM assigned a score of 0.15, whereas CASSIM assigned a score of 0.924.  Figure 10: A comparison of the parse trees of two syntactically dissimilar documents. Top document: "When we hate, we always move away from the grace of God. When we become resentful and unforgiving, the world around us seems spiteful and meaningless." Bottom document: "How can you be skiing if you are already swimming?" FastKASSIM similarity score: 0.219; CASSIM similarity score: 0.838. Figure 11: A comparison of the parse trees of two syntactically dissimilar documents. Top document: "How can you be skiing if you are already swimming?" Bottom document: "Knowledge is important to succeed." FastKASSIM similarity score: 0.439; CASSIM similarity score: 0.679. Figure 12: A comparison of the parse trees of two syntactically similar documents. Top document: "When we dream, we often search for deeper meaning. When we search for deeper meaning, other things become more nuanced too." Bottom document: "When we concentrate, we try to do better on a task. When we strive to do better, we end up doing better too." FastKASSIM similarity score: 0.656; CASSIM similarity score: 0.837. Figure 13: A comparison of the parse trees of two syntactically dissimilar documents. Top document: "I am old enough to draw freely upon my experience. Experience is more important than luck. Luck can turn. Experience lasts a lifetime." Bottom document: "Being loving makes you become better." FastKASSIM similarity score: 0.15; CASSIM similarity score: 0.924.