Hildegund Müller
Introducing Rhetorical Parallelism Detection: A New Task with Datasets, Metrics, and Baselines
Stephen Bothwell
Justin DeBenedetto
Theresa Crnkovich
Hildegund Müller
David Chiang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Rhetoric, both spoken and written, involves not only content but also style. One common stylistic tool is parallelism: the juxtaposition of phrases which have the same sequence of linguistic (e.g., phonological, syntactic, semantic) features. Despite the ubiquity of parallelism, the field of natural language processing has seldom investigated it, missing a chance to better understand the nature of the structure, meaning, and intent that humans convey. To address this, we introduce the task of rhetorical parallelism detection. We construct a formal definition of it; we provide one new Latin dataset and one adapted Chinese dataset for it; we establish a family of metrics to evaluate performance on it; and, lastly, we create baseline systems and novel sequence labeling schemes to capture it. On our strictest metric, we attain F1 scores of 0.40 and 0.43 on our Latin and Chinese datasets, respectively.