Sophia Pan


2025

pdf bib
A Generalizable Rhetorical Strategy Annotation Model Using LLM-based Debate Simulation and Labelling
Shiyu Ji | Farnoosh Hashemi | Joice Chen | Juanwen Pan | Weicheng Ma | Hefan Zhang | Sophia Pan | Ming Cheng | Shubham Mohole | Saeed Hassanpour | Soroush Vosoughi | Michael Macy
Findings of the Association for Computational Linguistics: EMNLP 2025

Rhetorical strategies are central to persuasive communication, from political discourse and marketing to legal argumentation. However, analysis of rhetorical strategies has been limited by reliance on human annotation, which is costly, inconsistent, difficult to scale. Their associated datasets are often limited to specific topics and strategies, posing challenges for robust model development. We propose a novel framework that leverages large language models (LLMs) to automatically generate and label synthetic debate data based on a four-part rhetorical typology (causal, empirical, emotional, moral). We fine-tune transformer-based classifiers on this LLM-labeled dataset and validate its performance against human-labeled data on this dataset and on multiple external corpora. Our model achieves high performance and strong generalization across topical domains. We illustrate two applications with the fine-tuned model: (1) the improvement in persuasiveness prediction from incorporating rhetorical strategy labels, and (2) analyzing temporal and partisan shifts in rhetorical strategies in U.S. Presidential debates (1960–2020), revealing increased use of affective over cognitive argument in U.S. Presidential debates.