Robin Schaefer

2023

pdf bib abs
Communicating Climate Change: A Comparison Between Tweets and Speeches by German Members of Parliament
Robin Schaefer | Christoph Abels | Stephan Lewandowsky | Manfred Stede
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

Twitter and parliamentary speeches are very different communication channels, but many members of parliament (MPs) make use of both. Focusing on the topic of climate change, we undertake a comparative analysis of speeches and tweets uttered by MPs in Germany in a recent six-year period. By keyword/hashtag analyses and topic modeling, we find substantial differences along party lines, with left-leaning parties discussing climate change through a crisis frame, while liberal and conservative parties try to address climate change through the lens of climate-friendly technology and practices. Only the AfD denies the need to adopt climate change mitigating measures, demeaning those concerned about a deteriorating climate as climate cult or fanatics. Our analysis reveals that climate change communication does not differ substantially between Twitter and parliamentary speeches, but across the political spectrum.

pdf bib abs
Towards Fine-Grained Argumentation Strategy Analysis in Persuasive Essays
Robin Schaefer | René Knaebel | Manfred Stede
Proceedings of the 10th Workshop on Argument Mining

We define an argumentation strategy as the set of rhetorical and stylistic means that authors employ to produce an effective, and often persuasive, text. First computational accounts of such strategies have been relatively coarse-grained, while in our work we aim to move to a more detailed analysis. We extend the annotations of the Argument Annotated Essays corpus (Stab and Gurevych, 2017) with specific types of claims and premises, propose a model for their automatic identification and show first results, and then we discuss usage patterns that emerge with respect to the essay structure, the “flows” of argument component types, the claim-premise constellations, the role of the essay prompt type, and that of the individual author.

2022

pdf bib abs
GerCCT: An Annotated Corpus for Mining Arguments in German Tweets on Climate Change
Robin Schaefer | Manfred Stede
Proceedings of the Thirteenth Language Resources and Evaluation Conference

While the field of argument mining has grown notably in the last decade, research on the Twitter medium remains relatively understudied. Given the difficulty of mining arguments in tweets, recent work on creating annotated resources mainly utilized simplified annotation schemes that focus on single argument components, i.e., on claim or evidence. In this paper we strive to fill this research gap by presenting GerCCT, a new corpus of German tweets on climate change, which was annotated for a set of different argument components and properties. Additionally, we labelled sarcasm and toxic language to facilitate the development of tools for filtering out non-argumentative content. This, to the best of our knowledge, renders our corpus the first tweet resource annotated for argumentation, sarcasm and toxic language. We show that a comparatively complex annotation scheme can still yield promising inter-annotator agreement. We further present first good supervised classification results yielded by a fine-tuned BERT architecture.

pdf bib
Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)
Robin Schaefer | Xiaoyu Bai | Manfred Stede | Torsten Zesch
Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)

pdf bib abs
On Selecting Training Corpora for Cross-Domain Claim Detection
Robin Schaefer | René Knaebel | Manfred Stede
Proceedings of the 9th Workshop on Argument Mining

Identifying claims in text is a crucial first step in argument mining. In this paper, we investigate factors for the composition of training corpora to improve cross-domain claim detection. To this end, we use four recent argumentation corpora annotated with claims and submit them to several experimental scenarios. Our results indicate that the “ideal” composition of training corpora is characterized by a large corpus size, homogeneous claim proportions, and less formal text domains.

2021

pdf bib abs
UPAppliedCL at GermEval 2021: Identifying Fact-Claiming and Engaging Facebook Comments Using Transformers
Robin Schaefer | Manfred Stede
Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments

In this paper we present UPAppliedCL’s contribution to the GermEval 2021 Shared Task. In particular, we participated in Subtasks 2 (Engaging Comment Classification) and 3 (Fact-Claiming Comment Classification). While acceptable results can be obtained by using unigrams or linguistic features in combination with traditional machine learning models, we show that for both tasks transformer models trained on fine-tuned BERT embeddings yield best results.

2020

pdf bib abs
Annotation and Detection of Arguments in Tweets
Robin Schaefer | Manfred Stede
Proceedings of the 7th Workshop on Argument Mining

Notwithstanding the increasing role Twitter plays in modern political and social discourse, resources built for conducting argument mining on tweets remain limited. In this paper, we present a new corpus of German tweets annotated for argument components. To the best of our knowledge, this is the first corpus containing not only annotated full tweets but also argumentative spans within tweets. We further report first promising results using supervised classification (F1: 0.82) and sequence labeling (F1: 0.72) approaches.

pdf bib abs
A Two-Step Approach for Automatic OCR Post-Correction
Robin Schaefer | Clemens Neudecker
Proceedings of the 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

The quality of Optical Character Recognition (OCR) is a key factor in the digitisation of historical documents. OCR errors are a major obstacle for downstream tasks and have hindered advances in the usage of the digitised documents. In this paper we present a two-step approach to automatic OCR post-correction. The first component is responsible for detecting erroneous sequences in a set of OCRed texts, while the second is designed for correcting OCR errors in them. We show that applying the preceding detection model reduces both the character error rate (CER) compared to a simple one-step correction model and the amount of falsely changed correct characters.

Co-authors

Torsten Zesch 1

Clemens Neudecker 1

Venues

ws1

latechclfl1