Unveiling the Power of Argument Arrangement in Online Persuasive Discussions

Previous research on argumentation in online discussions has largely focused on examining individual comments and neglected the interactive nature of discussions. In line with previous work, we represent individual comments as sequences of semantic argumentative unit types. However, because it is intuitively necessary for dialogical argumentation to address the opposing viewpoints, we extend this model by clustering type sequences into different argument arrangement patterns and representing discussions as sequences of these patterns. These sequences of patterns are a symbolic representation of argumentation strategies that capture the overall structure of discussions. Using this novel approach, we conduct an in-depth analysis of the strategies in 34,393 discussions from the online discussion forum Change My View and show that our discussion model is effective for persuasiveness prediction, outperforming LLM-based classifiers on the same data. Our results provide valuable insights into argumentation dynamics in online discussions and, through the presented prediction procedure, are of practical importance for writing assistance and persuasive text generation systems.


Introduction
Convincing others can be quite challenging, even when equipped with a comprehensive set of arguments.The questions then arise: what kind of arguments are the most convincing and which should be presented first?Should one begin with facts or personal experiences?The different answers to these and related questions are referred to as argumentation strategies (Al-Khatib et al., 2017).Several studies have empirically examined the arrangement of argumentative discourse unit (ADU) types, such as facts or testimonies, in monological texts (e.g., Al-Khatib et al. (2017)), or in individual comments within dialogues (e.g., Hidey et al. (2017); Morio et al. (2019)).These studies have shown that the CMV: Today is the best time period in human history to be alive for the vast majority of people.With all the negativity, we really need to focus on the data and the positive [P].There is almost not a single important data point that you can say is trending up over time [V].Wars, famine, human rights, healthcare, inclusion, infant mortality [R].The list goes on and on and on.The only big change had been access to negative information [V], but the actual texture of the world has gotten way better over time [V].What am I missing?I'm looking for someone to point out the flaw in my logic.The only thing I can see is our human nature is not supposed to be this connected to a constant flow of information [V], but that's easy to turn off.Please help change my view.
Currently in the developed world we are living through an unprecedented mental health epidemic with skyrocketing suicides [V].Cost of living is outstripping earnings for the majority of workers and houses are becoming so expensive[V] that I, someone who makes well above the national mean, can no longer live in the city I grew up in [T].My generation is the first that is predicted not to live as long as the previous generation on average [F] and all of this is taking place to a backdrop of the existential threat that is climate change [V].The best time period passed us by recently[V] but we no longer live in it [V].
The mental health crises I think is very real so for that.… The best time was easily early 2010s up to 2017 [V], since then we've seen the first time civil rights have been rolled back [F].Plenty of societies have collapsed before[F], there's no reason ours can't [V].So no, today is absolutely not the best time to be alive [V].We're in a backwards shift, hopefully we can recover but we can hardly assume we will without actively addressing the issues that have led to our backwards decline [V].arrangement of ADU types in a text can serve as a model of the argumentation that underlies the text, for example for predicting its persuasiveness.
In dialogues, such as the two discussion branches shown in Figure 1, it seems intuitive that there is no single best argumentation strategy for either side.Instead, the strategy needs to be dynamically adapted in response to the ongoing conversation.For instance, if the opponent concludes their ar-gument with a fact, countering them with another fact before suggesting a policy might be more convincing.Notably, this dialogic nature has been mostly overlooked in the computational analysis of argumentative discussions so far.
In this paper, we examine the nuances of ongoing argumentation in dialogues with the goal of understanding the particular elements that contribute to the success of particular debaters.Through a detailed analysis of ADU sequences, we seek to reveal the underlying patterns and strategies utilized by skilled debaters.These strategies are essential for advancing both theoretical understanding and practical application.Theoretically, they particularly improve our understanding of persuasive dialogues, enabling the refinement of skills among novice debaters and thereby enhancing the overall quality of discussions.In practical terms, these strategies serve as valuable guidance for the automated generation of persuasive texts that resonate effectively with various audiences.Moreover, a set of core strategies can potentially form the backbone of advanced writing tools, providing substantial support to writers as they structure their arguments.
To reveal these strategies, we introduce a new model for argument arrangement.This model includes identifying particular types of ADUs within a given argumentative discourse, analyzing the sequences of these ADUs, and clustering these sequences to reveal the patterns and strategies.We test our model using a large-scale dataset of persuasive discussions gathered from Change My View. 1e sampled two types of discussion branches, dialogue and polylogue, and identified 16 clusters of similar ADU type patterns, each representing a different strategy used in the discussions.The sequence of clusters for each discussion (one cluster assignment per comment) then serves as a model of the discussion, which we evaluate against the task of determining the persuasiveness of the commenter.2For this task of persuasiveness detection, our model outperforms several strong baselines.
Overall, this paper introduces a new model for identifying argument arrangement strategies through clusters of similar ADU type patterns (Section 3).We develop a large-scale dataset comprising 34,393 discussion branches and completely tag it for ADU types using a new approach that outperforms the previous state-of-the-art by 0.18 in terms of F 1 score (Section 4).Moreover, we use our model to identify clusters representing the arrangement strategies used in these discussions, and show the utility of cluster sequences for argumentation analysis through the example task of predicting persuasiveness (Section 5).3

Related Work
Argumentative discussions play a major role in computational argumentation analysis, for example when detecting counterarguments (Wachsmuth et al., 2018b) or whether two arguments are on the same side (Körner et al., 2021).A special emphasis is placed on the Change My View platform due to its extensive user base, strict moderation, and user-provided persuasiveness rating.Several prior works examined individual comments, where a wide range of linguistic, stylistic, and argumentative features have been employed to predict their persuasiveness (e.g., Tan et al. (2016); Hidey et al. (2017); Persing and Ng (2017); Morio et al. (2019)).For analyses beyond individual comments, Ji et al. (2018) employ features of both comments and the corresponding opening post, but do not look at discussions that are longer than one comment to the opening post.Guo et al. (2020) consider whole discussions and employ several linguistic features, but none tied to arrangement.Our paper makes a significant contribution to the task of predicting persuasiveness, going beyond exploring individual comments and delving into the arrangement of arguments within discussions.Hidey et al. (2017) model unit types for premises (ethos, logos, and pathos) and claims (interpretation, evaluation, agreement, or disagreement), revealing that the relative position of types can indicate a comment's persuasiveness.Morio et al. (2019), on the other hand do not distinguish between premises and claims.They use the following types: testimony, fact, value, policy, or rhetorical statements.We employ the types of Morio et al. (2019) for their simplicity, having shown impact on persuasiveness, and the easy to understand semantics behind the types (cf.Section 4.3), making them well-suited for modeling argument arrangement.

Argument arrangement in online discussions
The investigation of argument arrangement in social media platforms and its influence on persuasiveness remains a relatively understudied area with only a few works so far.Hidey et al. (2017) explored whether the types of their proposed model (see above) follow a specific order within Change My View discussions.They identified certain sequential patterns, such as the tendency for pathos premises to follow after claims of emotional evaluation.Morio et al. (2019) modeled persuasion strategies by the positional role of argumentative units, also by examining Change My View discussions.They find that facts and testimonies were commonly positioned at the beginning of posts, indicating the importance of presenting factual information before making claims.Conversely, policy suggestions tend to appear towards the end of posts, implying that recommendations or courses of action were treated as conclusions in the argumentation process.The closest work to ours is that by Hidey and McKeown (2018), who investigate the impact of argument arrangement on persuasiveness, albeit only within individual comments.They predict persuasiveness using word-level features, Penn Discourse TreeBank relations, and FrameNet semantic frames.In contrast, our work incorporates a clustering step that facilitates grouping similar sequences of argument unit types, enabling a more thorough exploration of arrangement strategies.Al-Khatib et al. (2017) in the study of argument arrangement identified evidence patterns in 30,000 online news editorials from the New York Times and associated them with persuasive strategies.Through their analysis, they established specific rules for constructing effective editorials, like that arguments containing units of type testimony should precede those containing units of type statistics.Wachsmuth et al. (2018a) emphasized the importance of argument arrangement as a primary means in the context of generating arguments with a rhetorical strategy.They proposed and conducted a manual synthesis of argumentative texts that involves specific selection, phrasing, and arrangement of arguments following an effective rhetorical strategy.In contrast, our research specifically focuses on persuasive discussions.While there is an overlap in the argument unit types employed between our study and the work of Al-Khatib et al. (2017), the central focus in the former is on the concept of evidence.Moreover, our findings regarding argument arrangement hold the potential to be incorporated into the approach by Wachsmuth et al. (2018a) for generating persuasive arguments.

Modeling Argument Arrangement
Although various studies have explored the selection and phrasing of arguments in online discussions, often by analyzing the distribution of argument attributes and linguistic features (e.g., Wiegmann et al. (2022)), the effect of argument arrangement has received relatively limited attention.This paper aims to address this research gap by investigating the impact of argument arrangement, not only in predicting persuasiveness but also in gaining a deeper understanding of how individuals en-gage in persuasive discussions.
In our approach to modeling argument arrangement in opening posts and comments, we employ a three-step pipeline, as illustrated in Figure 2. First, we identify the argumentative discourse units (ADUs) and their corresponding semantic types within the opening post and comments (Section 3.1).Next, we mine and abstract the sequence of ADU types to discover overarching patterns (Section 3.2).Finally, we categorize these patterns into clusters based on their similarity to other patterns (Section 3.3).

ADU Type Identification
For this step, we utilize the five ADU types introduced by Morio et al. (2019): fact, policy, testimony, value, and rhetorical question.One advantage of using these specific ADU types is the availability of a manually annotated dataset that serves as a valuable resource for fine-tuning a Large Language Model to effectively classify the ADU type in a given text.By identifying and categorizing these ADU types, we lay the foundation for understanding the structure and organization of arguments within the discussion.

ADU Type Pattern Mining
Once the ADU types are identified, we examine the sequences in which these types appear within the comments.To enhance the mining of more general and reliable sequences, we incorporate the "Change" transformation proposed by Wachsmuth et al. (2015).This transformation focuses on the transitions or changes between ADU types within a sequence.Through this abstraction, we emphasize the shifts in ADU types rather than the specific instances of each type.For example, if a sequence initially consists of [policy, policy, fact], it is abstracted to [policy, fact].

ADU Type Pattern Clustering
Once the patterns are identified, they are grouped into clusters.The assigned cluster's ID is then considered the sole representation of the arrangement of the opening post or comment.These clusters are identified using a clustering algorithm -we use hierarchical agglomerative clustering for its simplicity and adaptability -on all patterns of the dataset.
A critical element for pattern clustering is the measure of distance between patterns.Due to its importance for the clustering and thus for our approach, we propose and evaluate two measures: Edits The Edits approach calculates the normalized distance between two sequences by quantifying the minimum number of edits (insertions, deletions, or substitutions) needed to transform one sequence into the other.We employ the widely used Levenshtein distance to determine the edit count.Yet, to ensure fairness, the resulting distance is normalized by the length of the longer sequence.This normalization ensures that an edit made to a shorter sequence has a more significant impact on the overall distance compared to an edit made to a longer sequence.
SGT Embeddings In the Embeddings approach, the ADU type sequences are transformed into a fixed-dimensional space using the Sequence Graph Transform technique proposed by Ranjan et al. (2022).This technique offers the capability to capture a range of dependencies without introducing additional computational complexity.The distance between two embedded sequences is then determined by employing a standard distance metric, such as Euclidean distance.
After the distance matrix calculation using either of the above-mentioned methods, hierarchical clustering can be applied.Initially, each sequence is considered an individual cluster.The algorithm then proceeds to merge the most similar clusters iteratively based on the distance between them.This process continues until all sequences are merged into a single cluster or until a predefined stopping criterion is met.The resulting hierarchical clustering tree, also known as a dendrogram, provides a visual representation of the clusters' hierarchical relationships.It allows us to observe the formation of subclusters and the overall clustering structure.An optimal number of clusters can be determined either by setting a threshold on the dendrogram or by employing other methods, such as the Elbow method (Thorndike, 1953).
After performing clustering, the arrangement is represented by the longest common sequence of ADU types within each cluster.This way we can gain insights into the prevailing ADU types and their sequential order, which represents the characteristic arrangement pattern within that particular cluster.

Preparing a Dataset of Argument Arrangements in Online Discussions
To evaluate the effectiveness of our argument arrangement model, we applied it to real-world online discussions data, specifically focusing on the task of predicting persuasiveness.We collected a dataset comprising 34,393 discussion branches, extracted from 1,246 threads sourced from the Change My View online discussion platform via the public Reddit API.These discussions adhere to a structured format enforced by the community, with persuasive discussions being identified by the presence of a ∆ character, in accordance with community guidelines, which serves as our ground truth (Section 4.1).For our analysis, we categorize discussions into two scenarios, dialogue and polylogue (Section 4.2), and automatically identify Argumentative Discourse Units (ADUs) and their respective semantic types (Section 4.3).Subsequently, we offer initial insights into the use of ADU types through a statistical analysis of their positional distribution within a discussion branch (Section 4.4).

Structure of Change My View Discussions
Change My View provides a dynamic and inclusive platform that encourages users to engage in thoughtful discussions, with a strong emphasis on presenting well-reasoned arguments.The discussions are carefully moderated according to the community rules4 ensuring the discussions maintain a high standard of quality and remain focused on the topic at hand.In the Change My View platform, discussions, known as threads, begin when a user, referred to as the original poster, submits a text-based opening post expressing and justifying their viewpoint on any controversial topic.As implied by the name of the platform, each such submission is a challenge to other users, known as commenters, to change the original poster's view by submitting textual comments as replies.Both the original poster and commenters have the ability to contribute additional comments.The original poster may choose to add comments to further defend their justifications, while other commenters can provide counterarguments or rebuttals.This iterative commenting process can lead to long chains of discussion.For the purposes of this paper, we define each path from the opening post to a comment without any subsequent replies as a branch.These branches represent distinct trajectories within the discussion, highlighting the progression and development of the conversation from the opening post to subsequent comments.When the original poster perceives that a commenter has successfully influenced their viewpoint, even if only to a minor extent, they indicate this by posting a response that includes the ∆ character.In line with prior research, we consider only those branches that contain a ∆ in a comment from the original poster as persuasive.5

Grouping Discussions by Scenario
For an in-depth analysis of persuasiveness, we categorize the branches into two subsets based on different discussion scenarios: Dialogue In this scenario, discussions unfold with a sequential exchange of arguments between the original poster and a single unique commenter.Here, both parties take turns presenting their viewpoints and counterarguments in a back-and-forth manner.
Polylogue This scenario encompasses discussion branches where multiple commenters engage with the original poster.These interactions involve multiple participants attempting to persuade the original poster, resulting in a dynamic, multi-party conversation.
This differentiation allows us to examine the effectiveness of persuasive discourse in different discussion scenarios and gain insights into the dynamics of persuasion.Table 1 provides key numbers for each scenario: dialogues more often contain a ∆ and are on average shorter than polylogues.Moreover, out of the 1,246 total threads, nearly all contain at least one dialogue (1,236) and the majority contain at least one polylogue branch (1,058).

Automatic Identification of ADU Types
The fundamental building blocks of our approach (cf.Section 3.1) are the semantic ADU types presented by Morio et al. (2019 To identify the ADU types within the collected discussions, we fine-tuned a pre-trained ELECTRA-Large model (Clark et al., 2020) on the human-annotated dataset of Morio et al. (2019).The model was specifically trained for sequence labeling at the post level, enabling it to detect argumentative discourse units and assign corresponding ADU types.We evaluated the performance of our model on a separate hold-out test set from the dataset used by Morio et al. (2019).The model achieved an F 1 score of 0.62, surpassing the performance of the original authors' model by 0.18 and in our view is sufficiently reliable for large-scale analyses.Given that both our dataset and the one by Morio et al. (2019) were sourced from Change My View platform, it is reasonable to expect a similar level of performance when applied to our dataset.

Positional Role of ADU Types
The analysis of the human-annotated dataset by Morio et al. (2019) revealed slight differences in the distribution of ADU types between persuasive and non-persuasive comments.They observed that testimonies and facts tend to occur more frequently at the beginning of texts, while policies are more prevalent towards the end.Figure 3 shows the result of reproducing their analysis on our larger dataset, utilizing automatically identified ADU types.As shown in the figure, our dataset exhibits a more balanced distribution of facts, whereas there is a noticeable increase in the usage of values and rhetorical statements towards the end of the discussions.
We attribute these differences to our dataset containing not only the first comment to the opening post but complete discussion branches.Moreover, the figure suggests that even the position of ADU types alone can be a -though rather weak -predictor of persuasiveness.We anticipate that our exploration of ADU type arrangement will further enhance this predictive capability.
In this section, we present the results of modeling argument arrangement of Argumentative Discourse Unit (ADU) types introduced in Section 3.3 and propose several approaches for predicting the persuasiveness of discussion branches.Furthermore, we conduct an experimental validation of our findings by incorporating cluster features in the task of persuasiveness prediction.

Identified Arrangement Clusters
Following our approach as described in Section 3.3, we cluster the ADU type patterns in our dataset.Notably, we conducted separate clustering for the patterns found in opening posts and comments, recognizing their distinct pragmatic purposes (justifying a view vs. arguing).To determine an appropriate number of clusters for each, we applied the widely-used Elbow criterion (Thorndike, 1953).
Remarkably, this analysis revealed that both distance measures led to the same number of clusters for both opening posts (10 clusters) and comments (6 clusters).
Table 2 provides an overview of the identified clusters.Notably, when employing SGT embedding similarity, the clusters exhibit a more even distribution of texts.Directly related to that, the most frequent patterns per cluster tend to be shorter when employing SGT embeddings, especially when clustering comments.This indicates that in this approach short and frequent patterns are rather assigned to different clusters than concentrated in a single one.This approach holds promise for distinguishing various arrangement strategies effectively.Moreover, the table shows that the average length of patterns is quite different between clusters when using the Edit distance approach, which is also desirable for the same reason.Furthermore, we confirmed that the clusters also exhibit variations in the distribution density of ADU types, as demonstrated in Figure 3 for the entire dataset.The density plots for the clusters based on SGT embeddings can be found in the appendix (Figure 5).

Approaches for Predicting Persuasiveness
In order to evaluate the effectiveness of our argument arrangement model, we employ it in the context of predicting persuasiveness within discussions: Given a discussion branch, predict whether the original poster ultimately awards a ∆ in the end.We compare the following approaches: Table 2: Overview of the identified clusters in our dataset, either via the embeddings or edits distance, and either in the opening posts (OPX) or comments (CX): number of texts (opening posts or comments) they contain as well as the number of unique patterns, their average length, and the most frequent one.
Length-Based Classifier As observed by Tan et al. (2016) and reflected in our dataset (cf.Table 1), lengthy discussions on Change My View tend to receive fewer ∆ awards.To establish a preliminary baseline, we employ logistic regression to classify discussion branches as persuasive or not based solely on their length feature.
Interplay Features In the study of Tan et al. (2016), the most effective features for the persuasiveness prediction were extracted from the interaction between the opening post and the replies.They derived 12 features from 4 similarity measures (common words, similar fraction in response, similar fraction in the opening post, and Jaccard score) and 3 subsets (all words, stop words, and content words).As a lexical baseline approach, we employ the same features and use logistic regression to classify the discussion branches.
BERT-Based Classifier As a baseline that captures both the semantic and syntactic information of the text, we employ a classifier based on BERT large embeddings (Devlin et al., 2019).For this approach, we concatenate the BERT embeddings for each turn in the discussion and pass them through a bidirectional LSTM (Hochreiter and Schmidhuber, 1997) to capture contextual dependencies between different turns.The LSTM's outputs are then processed through a linear layer and softmax activation to predict persuasiveness.Argument Arrangement: Clusters In this approach, we utilize a bidirectional LSTM model in conjunction with a linear layer and softmax activation for predicting persuasiveness labels based on the identified cluster features.The input to the LSTM consists of a sequence of cluster features identified by either the SGT embeddings or Edit distance approach.

Combination of Contextualized Features and Argument Arrangement
The study by Li et al. (2020) demonstrated that incorporating argument structure features into an LSTM model plays an essential role in predicting which round of debate (Pro vs. Con) makes a more convincing argument on the debate.orgwebsite.In a similar manner, to leverage the strengths of both the ADU type arrangements and BERT embeddings, we combine the outputs from the previous two models.Two linear layers with softmax are used to predict the output probabilities over both of these LSTM models separately.
Llama-2-7B Zero-Shot We utilize the out-of-thebox Llama-2 model with seven billion parameters (Touvron et al., 2023).Given the discussion branch, the model was prompted to generate binary answers (yes/no) when asked whether the original poster's opinion was altered by the end of the discussion.
Llama-2-7B Fine-Tuned Branches Using the same prompt as in the zero-shot approach, we fine-tune the Llama-2-7B model on the discussion branches, with the true yes/no answers provided in the prompt.We implement low-rank adaptation (Hu et al., 2022) with parameters r = 64 and scaling factor α = 16 applied to all linear layers.The model was fine-tuned for one epoch, as further training did not yield significant reduction in the loss, with a batch size of 8 and a learning rate of 2 × 10 −4 on a single NVIDIA A100 GPU (40GB).
Llama-2-7B Fine-Tuned Branches+SGT Clusters This approach closely mirrors the one above, with a slight variation in the prompt.Here, we extend the prompt with cluster features identified from the dataset using the SGT Embeddings approach.Figure 4 illustrates the prompt template used in all three configurations.

Results of Predicting Persuasiveness
We present the results of our persuasiveness prediction analysis for all branches under the two scenarios described in Section 4.2.Considering the class imbalance, we evaluate the approaches on the complete dataset as well as a balanced sample to address the imbalance issue.For the evaluation, we divide the branches into training and test sets, maintaining a ratio of 8 : 2. The predictions are then assessed on the held-out test set using the macro-F 1 measure.Given a heavily skewed label distribution, we want to emphasize the importance of ∆ prediction, thus we provide more detailed evaluation results for this label.The effectiveness of the classifiers is reported in Table 3.
The classification results reveal several findings: First, the length baseline provides a starting point for comparison of the discussion branches in the balanced setting, achieving an overall F 1 score of 0.56 in both scenarios and 0.69 F 1 score for the ∆ discussions.While the length baseline provides a basic understanding of the relationship between discussion length and persuasiveness, we recognize the need for more sophisticated methods to capture the nuances of persuasive discourse.Second, our experiments with encoding the discussions using BERT embeddings have yielded promising results, however, we have observed that incorporating argument arrangement features further enhances the prediction performance.In fact, solely using cluster features to represent the discussion dynamics is effective for persuasion prediction.We can see a similar trend with the Llama-2-7B-based approaches, where the performance of the model significantly improves after fine-tuning on the discussion branches, but even more so when incorporating the identified strategy features in the prompt.
The enhanced prediction performance achieved by incorporating argument arrangement features highlights their significance in capturing the intricate dynamics of persuasive discussions.

### Instruction:
Below is a conversation between OP and one or more users along with the persuasion strategies employed by each of them.Read the discussion and decide whether by the end of the discussion the opinion of the OP was changed.

Conclusion
This paper expanded our understanding of online persuasive dialogues by uncovering that debaters follow certain arrangement strategies.We introduced a new model for arrangement strategies that is based on patterns of argumentative discourse unit types.Clusters of such patterns correspond to different strategies, with sequences of these clusters (one element per comment of a discussion branch) representing a whole discussion.This model was operationalized using a large-scale dataset comprising 34,393 discussion branches.In a comparative evaluation of ten approaches, we demonstrate the remarkable utility of these arrangement strategies in predicting persuasiveness -both if used as sole feature and in addition to others -, emphasizing their essential role in unraveling the dynamics of persuasive online discussions.
Still, there's ample room for refining and expanding this research further.One aspect worth exploring is the development of more fine-grained categories for argumentative discourse units, such as incorporating the human value categories proposed by Kiesel et al. (2022).Besides, the identified arrangement strategies have applications beyond persuasiveness prediction.Exploring their potential in tasks such as writing assistance and text generation could broaden the scope of argumentative discourse analysis into new horizons.

Limitations
While our research contributes valuable insights into the prediction of persuasiveness and the role of argument arrangement, it is important to acknowledge certain limitations that may impact the generalizability and interpretation of our findings.
ADU Type Classification We rely on automated methods for the identification and classification of argumentative discourse units (ADUs), which may introduce errors or inaccuracies.Despite efforts to ensure accuracy, misclassifications or inconsistencies in ADU labeling could potentially impact the analysis and predictions.Further refinement and validation of the ADU classification process are necessary for more robust results.
Limited Scope of Features Our study primarily focuses on ADU types and argument arrangement features in predicting persuasiveness.While these features have shown promising results, there are certainly other important linguistic, contextual, or stylistic features that were not considered in our analysis.Future research should explore combinations of such features with arrangement features, like we did with BERT embeddings, to better understand their individual and combined impact on persuasiveness prediction.
Only One Platform Our study focuses primarily on the Change My View platform, which may limit the generalizability of our findings to other social media platforms or contexts.The characteristics and dynamics of persuasive discourse may vary across different platforms, user demographics, and topics.Future research should explore the generalizability of our findings to a broader range of platforms and contexts.

Figure 1 :
Figure 1: Illustration of one Change My View thread with two discussion branches (with commenters U1 and U2, respectively) and detected ADU types.The original poster's (OP) comment with ∆ in the first discussion branch marks that branch as persuasive.

Figure 2 :
Figure 2: Pipeline for modeling the overall strategy as flows of ADU type arrangements per its opening post and comments.Illustrated on the two discussion branches of the example from Figure 1.The clusters are determined based on the abstracted patterns of all opening posts and comments in the dataset, respectively.

Figure 3 :
Figure 3: Distribution density of relative start positions (as a fraction of overall text length in a branch) for ADUs of each type, separately for branches with ∆ (green) and without (red).

Figure 4 :
Figure 4: Example of a prompt we employ for zero-shot and fine-tuning experiments.The text colored in blue is adapted to the conversation at hand.The text colored in red is used only when identified clusters are employed.
).6The authors provide the following definitions and examples of the five ADU types: Value (V) is a proposition that refers to subjective value judgments without providing a statement on what should be done: it is absolutely terrifying.Policy (P) offers a specific course of action to be taken or what should be done: intelligent students should be able to see that.

Table 3 :
Effectiveness of the tested models in predicting whether a discussion branch received a ∆ in the different scenarios and for both the unbalanced (full) and a balanced test set.Employed measures are Macro-F 1 score (F 1 ) as well as the F 1 score, precision, and recall for detecting delta discussions (F 1∆ , P ∆ , R ∆ ).