Manfred Stede - ACL Anthology

Manfred Stede

2025

From Debates to Diplomacy: Argument Mining Across Political Registers
Maria Poiaganova | Manfred Stede
Proceedings of the 12th Argument mining Workshop

This paper addresses the problem of cross-register generalization in argument mining within political discourse. We examine whether models trained on adversarial, spontaneous U.S. presidential debates can generalize to the more diplomatic and prepared register of UN Security Council (UNSC) speeches. To this end, we conduct a comprehensive evaluation across four core argument mining tasks. Our experiments show that the tasks of detecting and classifying argumentative units transfer well across registers, while identifying and labeling argumentative relations remains notably challenging, likely due to register-specific differences in how argumentative relations are structured and expressed. As part of this work, we introduce ArgUNSC, a new corpus of 144 UNSC speeches manually annotated with claims, premises, and their argumentative links. It provides a resource for future in- and cross-domain studies and novel research directions at the intersection of argument mining and political science.

Applying the Character-Role Narrative Framework with LLMs to Investigate Environmental Narratives in Scientific Editorials and Tweets
Francesca Grasso | Stefano Locci | Manfred Stede
Proceedings of the 2nd Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2025)

Communication aiming to persuade an audience uses strategies to frame certain entities in ‘character roles’ such as hero, villain, victim, or beneficiary, and to build narratives around these ascriptions. The Character-Role Framework is an approach to model these narrative strategies, which has been used extensively in the Social Sciences and is just beginning to get attention in Natural Language Processing (NLP). This work extends the framework to scientific editorials and social media texts within the domains of ecology and climate change. We identify characters’ roles across expanded categories (human, natural, instrumental) at the entity level, and present two annotated datasets: 1,559 tweets from the Ecoverse dataset and 2,150 editorial paragraphs from Nature & Science. Using manually annotated test sets, we evaluate four state-of-the-art Large Language Models (LLMs) (GPT-4o, GPT-4, GPT-4-turbo, LLaMA-3.1-8B) for character-role detection and categorization, with GPT-4 achieving the highest agreement with human annotators. We then apply the best-performing model to automatically annotate the full datasets, introducing a novel entity-level resource for character-role analysis in the environmental domain.

Predicting Functional Content Zones in German Source-Dependent Argumentative Essays: Experiments on a Novel Dataset
Xiaoyu Bai | Manfred Stede
Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Long and Short Papers

Heroes, Villains, and Victims: Character Narratives in the WPS Agenda of the UNSC
Hannah Mathilde Steinbach | Imge Yüzüncüoglu | Raluca Rilla | Manfred Stede
Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Workshops

Sentence-Alignment in Semi-parallel Datasets
Steffen Frenzel | Manfred Stede
Proceedings of the 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2025)

In this paper, we are testing sentence alignment on complex, semi-parallel corpora, i.e., different versions of the same text that have been altered to some extent. We evaluate two hypotheses: To make alignment algorithms more efficient, we test the hypothesis that matching pairs can be found in the immediate vicinity of the source sentence and that it is sufficient to search for paraphrases in a ‘context window’. To improve the alignment quality on complex, semi-parallel texts, we test the implementation of a segmentation into Elementary Discourse Units (EDUs) in order to make more precise alignments at this level. Since EDUs are the smallest possible unit for communicating a full proposition, we assume that aligning at this level can improve the overall quality. Both hypotheses are tested and validated with several embedding models on varying degrees of parallel German datasets. The advantages and disadvantages of the different approaches are presented, and our next steps are outlined.

Disagreements in analyses of rhetorical text structure: A new dataset and first analyses
Freya Hewett | Manfred Stede
Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025)

Discourse structure annotation is known to involve a high level of subjectivity, which often results in low inter-annotator agreement. In this paper, we focus on “legitimate disagreements”, by which we refer to multiple valid annotations for a text or text segment. We provide a new dataset of English and German texts, where each text comes with two parallel analyses (both done by well-trained annotators) in the framework of Rhetorical Structure Theory. Using the RST Tace tool, we build a list of all conflicting annotation decisions and present some statistics for the corpus. Thereafter, we undertake a qualitative analysis of the disagreements and propose a typology of underlying reasons. From this we derive the need to differentiate two kinds of ambiguities in RST annotation: those that result from inherent “everyday” linguistic ambiguity, and those that arise from specifications in the theory and/or the annotation schemes.

Proceedings of the 1st Workshop on Ecology, Environment, and Natural Language Processing (NLP4Ecology2025)
Valerio Basile | Cristina Bosco | Francesca Grasso | Muhammad Okky Ibrohim | Maria Skeppstedt | Manfred Stede
Proceedings of the 1st Workshop on Ecology, Environment, and Natural Language Processing (NLP4Ecology2025)

AfD-CCC: Analyzing the Climate Change Discourse of a German Right-wing Political Party
Manfred Stede | Ronja Memminger
Proceedings of the Fourth Workshop on NLP for Positive Impact (NLP4PI)

While the scientific consensus on anthropogenic climate change (CC) is undisputed now for a long time, public discourse is still divided. Considering the case of Europe, in the majority of countries, an influential right-wing party propagates climate scepticism or outright denial. Our work addresses the German party, which represents the second-largest faction in the federal parliament. In order to make the partys discourse on CC accessible to NLP-based analyses, we are compiling the, a collection of parliamentary speeches and other material from various sources. We report on first analyses of this new dataset using sentiment and emotion analysis as well as classification of populist language, which demonstrate clear differences to the language use of the two largest competing parties (social democrats and conservatives). We make the corpus available to enable further studies of the party’s rhetoric on CC topics.

Assessing Open-Weight Large Language Models on Argumentation Mining Subtasks
Mohammad Yeghaneh Abkenar | Weixing Wang | Hendrik Graupner | Manfred Stede
Proceedings of the 10th edition of the Swiss Text Analytics Conference

2024

NYTAC-CC: A Climate Change Subcorpus of New York Times Articles
Francesca Grasso | Ronny Patz | Manfred Stede
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)

Over the past decade, the analysis of discourses on climate change (CC) has gained increased interest within the social sciences and the NLP community. Textual resources are crucial for understanding how narratives about this phenomenon are crafted and delivered. However, there still is a scarcity of datasets that cover CC in news media in a representative way. This paper presents a CC-specific subcorpus extracted from the 1.8 million New York Times Annotated Corpus, marking the first CC analysis on this data. The subcorpus was created by combining different methods for text selection to ensure representativeness and reliability, which is further validated using ClimateBERT. To provide initial insights into the CC subcorpus, we discuss the results of a topic modeling experiment (LDA). These show the diversity of contexts in which CC is discussed in news media over time, which is relevant for various downstream tasks.

Neural Mining of Persian Short Argumentative Texts
Mohammad Yeghaneh Abkenar | Manfred Stede
Proceedings of the 2nd Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia (EURALI) @ LREC-COLING 2024

Argumentation mining (AM) is concerned with extracting arguments from texts and classifying the elements (e.g.,claim and premise) and relations between them, as well as creating an argumentative structure. A significant hurdle to research in this area for the Persian language is the lack of annotated Persian language corpora. This paper introduces the first argument-annotated corpus in Persian and thereby the possibility of expanding argumentation mining to this low-resource language. The starting point is the English argumentative microtext corpus (AMT) (Peldszus and Stede, 2015), and we built the Persian variant by machine translation (MT) and careful post-editing of the output. We call this corpus Persian argumentative microtext (PAMT). Moreover, we present the first results for Argumentative Discourse Unit (ADU) classification for Persian, which is considered to be one of the main fundamental subtasks of argumentation mining. We adopted span categorization using the deep learning model of spaCy Version 3.0 (a CNN model on top of Bloom embedding with attention) on the corpus for determing argumentative units and their type (claim vs. premise).

Discourse Parsing for German with new RST Corpora
Sara Shahmohammadi | Manfred Stede
Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024)

Discourse-Level Features in Spoken and Written Communication
Hannah J. Seemann | Sara Shahmohammadi | Manfred Stede | Tatjana Scheffler
Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024)

Proceedings of the 18th Linguistic Annotation Workshop (LAW-XVIII)
Sophie Henning | Manfred Stede
Proceedings of the 18th Linguistic Annotation Workshop (LAW-XVIII)

How Diplomats Dispute: The UN Security Council Conflict Corpus
Karolina Zaczynska | Peter Bourgonje | Manfred Stede
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We investigate disputes in the United Nations Security Council (UNSC) by studying the linguistic means of expressing conflicts. As a result, we present the UNSC Conflict Corpus (UNSCon), a collection of 87 UNSC speeches that are annotated for conflicts. We explain and motivate our annotation scheme and report on a series of experiments for automatic conflict classification. Further, we demonstrate the difficulty when dealing with diplomatic language - which is highly complex and often implicit along various dimensions - by providing corpus examples, readability scores, and classification results.

Rhetorical Strategies in the UN Security Council: Rhetorical Structure Theory and Conflicts
Karolina Zaczynska | Manfred Stede
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue

More and more corpora are being annotated with Rhetorical Structure Theory (RST) trees, often in a multi-layer scenario, as analyzing RST annotations in combination with other layers can lead to a deeper understanding of texts. To date, prior work on RST for the analysis of diplomatic language however, is scarce. We are interested in political speeches and investigate what rhetorical strategies diplomats use to communicate critique or deal with disputes. To this end, we present a new dataset with RST annotations of 82 diplomatic speeches aligned to existing Conflict annotations (UNSC-RST). We explore ways of using rhetorical trees to analyze an annotated multi-layer corpus, looking at both the relation distribution and the tree structure of speeches. In preliminary analyses we already see patterns that are characteristic for particular topics or countries.

Elaborative Simplification for German-Language Texts
Freya Hewett | Hadi Asghari | Manfred Stede
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue

There are many strategies used to simplify texts. In this paper, we focus specifically on the act of inserting information or elaborative simplification. Adding information is done for various reasons, such as providing definitions for concepts, making relations between concepts more explicit, and providing background information that is a prerequisite for the main content. As all of these reasons have the main goal of ensuring coherence, we first conduct a corpus analysis of simplified German-language texts that have been annotated with Rhetorical Structure Theory (RST). We focus specifically on how additional information is incorporated into the RST annotation for a text. We then transfer these insights to automatic simplification using Large Language Models (LLMs), as elaborative simplification is a nuanced task which LLMs still seem to struggle with.

2023

Towards Fine-Grained Argumentation Strategy Analysis in Persuasive Essays
Robin Schaefer | René Knaebel | Manfred Stede
Proceedings of the 10th Workshop on Argument Mining

We define an argumentation strategy as the set of rhetorical and stylistic means that authors employ to produce an effective, and often persuasive, text. First computational accounts of such strategies have been relatively coarse-grained, while in our work we aim to move to a more detailed analysis. We extend the annotations of the Argument Annotated Essays corpus (Stab and Gurevych, 2017) with specific types of claims and premises, propose a model for their automatic identification and show first results, and then we discuss usage patterns that emerge with respect to the essay structure, the “flows” of argument component types, the claim-premise constellations, the role of the essay prompt type, and that of the individual author.

Encoding Discourse Structure: Comparison of RST and QUD
Sara Shahmohammadi | Hannah Seemann | Manfred Stede | Tatjana Scheffler
Proceedings of the 4th Workshop on Computational Approaches to Discourse (CODI 2023)

We present a quantitative and qualitative comparison of the discourse trees defined by the Rhetorical Structure Theory and Questions under Discussion models. Based on an empirical analysis of parallel annotations for 28 texts (blog posts and podcast transcripts), we conclude that both discourse frameworks capture similar structural information. The qualitative analysis shows that while complex discourse units often match between analyses, QUD structures do not indicate the centrality of segments.

The UNSC-Graph: An Extensible Knowledge Graph for the UNSC Corpus
Stian Rødven-Eide | Karolina Zaczynska | Antonio Pires | Ronny Patz | Manfred Stede
Proceedings of the 3rd Workshop on Computational Linguistics for the Political and Social Sciences

Discourse Sense Flows: Modelling the Rhetorical Style of Documents across Various Domains
Rene Knaebel | Manfred Stede
Findings of the Association for Computational Linguistics: EMNLP 2023

Recent research on shallow discourse parsing has given renewed attention to the role of discourse relation signals, in particular explicit connectives and so-called alternative lexicalizations. In our work, we first develop new models for extracting signals and classifying their senses, both for explicit connectives and alternative lexicalizations, based on the Penn Discourse Treebank v3 corpus. Thereafter, we apply these models to various raw corpora, and we introduce ‘discourse sense flows’, a new way of modeling the rhetorical style of a document by the linear order of coherence relations, as captured by the PDTB senses. The corpora span several genres and domains, and we undertake comparative analyses of the sense flows, as well as experiments on automatic genre/domain discrimination using discourse sense flow patterns as features. We find that n-gram patterns are indeed stronger predictors than simple sense (unigram) distributions.

Communicating Climate Change: A Comparison Between Tweets and Speeches by German Members of Parliament
Robin Schaefer | Christoph Abels | Stephan Lewandowsky | Manfred Stede
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

Twitter and parliamentary speeches are very different communication channels, but many members of parliament (MPs) make use of both. Focusing on the topic of climate change, we undertake a comparative analysis of speeches and tweets uttered by MPs in Germany in a recent six-year period. By keyword/hashtag analyses and topic modeling, we find substantial differences along party lines, with left-leaning parties discussing climate change through a crisis frame, while liberal and conservative parties try to address climate change through the lens of climate-friendly technology and practices. Only the AfD denies the need to adopt climate change mitigating measures, demeaning those concerned about a deteriorating climate as climate cult or fanatics. Our analysis reveals that climate change communication does not differ substantially between Twitter and parliamentary speeches, but across the political spectrum.

2022

On Selecting Training Corpora for Cross-Domain Claim Detection
Robin Schaefer | René Knaebel | Manfred Stede
Proceedings of the 9th Workshop on Argument Mining

Identifying claims in text is a crucial first step in argument mining. In this paper, we investigate factors for the composition of training corpora to improve cross-domain claim detection. To this end, we use four recent argumentation corpora annotated with claims and submit them to several experimental scenarios. Our results indicate that the “ideal” composition of training corpora is characterized by a large corpus size, homogeneous claim proportions, and less formal text domains.

Extractive Summarisation for German-language Data: A Text-level Approach with Discourse Features
Freya Hewett | Manfred Stede
Proceedings of the 29th International Conference on Computational Linguistics

We examine the link between facets of Rhetorical Structure Theory (RST) and the selection of content for extractive summarisation, for German-language texts. For this purpose, we produce a set of extractive summaries for a dataset of German-language newspaper commentaries, a corpus which already has several layers of annotation. We provide an in-depth analysis of the connection between summary sentences and several RST-based features and transfer these insights to various automated summarisation models. Our results show that RST features are informative for the task of extractive summarisation, particularly nuclearity and relations at sentence-level.

Towards Identifying Alternative-Lexicalization Signals of Discourse Relations
René Knaebel | Manfred Stede
Proceedings of the 29th International Conference on Computational Linguistics

The task of shallow discourse parsing in the Penn Discourse Treebank (PDTB) framework has traditionally been restricted to identifying those relations that are signaled by a discourse connective (“explicit”) and those that have no signal at all (“implicit”). The third type, the more flexible group of “AltLex” realizations has been neglected because of its small amount of occurrences in the PDTB2 corpus. Their number has grown significantly in the recent PDTB3, and in this paper, we present the first approaches for recognizing these “alternative lexicalizations”. We compare the performance of a pattern-based approach and a sequence labeling model, add an experiment on the pre-classification of candidate sentences, and provide an initial qualitative analysis of the error cases made by both models.

UNSC-NE: A Named Entity Extension to the UN Security Council Debates Corpus
Luis Glaser | Ronny Patz | Manfred Stede
Journal for Language Technology and Computational Linguistics, Vol. 35 No. 2

Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)
Robin Schaefer | Xiaoyu Bai | Manfred Stede | Torsten Zesch
Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)

Argument Similarity Assessment in German for Intelligent Tutoring: Crowdsourced Dataset and First Experiments
Xiaoyu Bai | Manfred Stede
Proceedings of the Thirteenth Language Resources and Evaluation Conference

NLP technologies such as text similarity assessment, question answering and text classification are increasingly being used to develop intelligent educational applications. The long-term goal of our work is an intelligent tutoring system for German secondary schools, which will support students in a school exercise that requires them to identify arguments in an argumentative source text. The present paper presents our work on a central subtask, viz. the automatic assessment of similarity between a pair of argumentative text snippets in German. In the designated use case, students write out key arguments from a given source text; the tutoring system then evaluates them against a target reference, assessing the similarity level between student work and the reference. We collect a dataset for our similarity assessment task through crowdsourcing as authentic German student data are scarce; we label the collected text pairs with similarity scores on a 5-point scale and run first experiments on the task. We see that a model based on BERT shows promising results, while we also discuss some challenges that we observe.

GerCCT: An Annotated Corpus for Mining Arguments in German Tweets on Climate Change
Robin Schaefer | Manfred Stede
Proceedings of the Thirteenth Language Resources and Evaluation Conference

While the field of argument mining has grown notably in the last decade, research on the Twitter medium remains relatively understudied. Given the difficulty of mining arguments in tweets, recent work on creating annotated resources mainly utilized simplified annotation schemes that focus on single argument components, i.e., on claim or evidence. In this paper we strive to fill this research gap by presenting GerCCT, a new corpus of German tweets on climate change, which was annotated for a set of different argument components and properties. Additionally, we labelled sarcasm and toxic language to facilitate the development of tools for filtering out non-argumentative content. This, to the best of our knowledge, renders our corpus the first tweet resource annotated for argumentation, sarcasm and toxic language. We show that a comparatively complex annotation scheme can still yield promising inter-annotator agreement. We further present first good supervised classification results yielded by a fine-tuned BERT architecture.

2021

Proceedings of the 8th Workshop on Argument Mining
Khalid Al-Khatib | Yufang Hou | Manfred Stede
Proceedings of the 8th Workshop on Argument Mining

UPAppliedCL at GermEval 2021: Identifying Fact-Claiming and Engaging Facebook Comments Using Transformers
Robin Schaefer | Manfred Stede
Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments

In this paper we present UPAppliedCL’s contribution to the GermEval 2021 Shared Task. In particular, we participated in Subtasks 2 (Engaging Comment Classification) and 3 (Fact-Claiming Comment Classification). While acceptable results can be obtained by using unigrams or linguistic features in combination with traditional machine learning models, we show that for both tasks transformer models trained on fine-tuned BERT embeddings yield best results.

Automatically evaluating the conceptual complexity of German texts
Freya Hewett | Manfred Stede
Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021)

The Climate Change Debate and Natural Language Processing
Manfred Stede | Ronny Patz
Proceedings of the 1st Workshop on NLP for Positive Impact

The debate around climate change (CC)—its extent, its causes, and the necessary responses—is intense and of global importance. Yet, in the natural language processing (NLP) community, this domain has so far received little attention. In contrast, it is of enormous prominence in various social science disciplines, and some of that work follows the ”text-as-data” paradigm, seeking to employ quantitative methods for analyzing large amounts of CC-related text. Other research is qualitative in nature and studies details, nuances, actors, and motivations within CC discourses. Coming from both NLP and Political Science, and reviewing key works in both disciplines, we discuss how social science approaches to CC debates can inform advances in text-mining/NLP, and how, in return, NLP can support policy-makers and activists in making sense of large-scale and complex CC discourses across multiple genres, channels, topics, and communities. This is paramount for their ability to make rapid and meaningful impact on the discourse, and for shaping the necessary policy change.

2020

Annotation and Detection of Arguments in Tweets
Robin Schaefer | Manfred Stede
Proceedings of the 7th Workshop on Argument Mining

Notwithstanding the increasing role Twitter plays in modern political and social discourse, resources built for conducting argument mining on tweets remain limited. In this paper, we present a new corpus of German tweets annotated for argument components. To the best of our knowledge, this is the first corpus containing not only annotated full tweets but also argumentative spans within tweets. We further report first promising results using supervised classification (F1: 0.82) and sequence labeling (F1: 0.72) approaches.

Contextualized Embeddings for Connective Disambiguation in Shallow Discourse Parsing
René Knaebel | Manfred Stede
Proceedings of the First Workshop on Computational Approaches to Discourse

This paper studies a novel model that simplifies the disambiguation of connectives for explicit discourse relations. We use a neural approach that integrates contextualized word embeddings and predicts whether a connective candidate is part of a discourse relation or not. We study the influence of those context-specific embeddings. Further, we show the benefit of training the tasks of connective disambiguation and sense classification together at the same time. The success of our approach is supported by state-of-the-art results.

Exploiting a lexical resource for discourse connective disambiguation in German
Peter Bourgonje | Manfred Stede
Proceedings of the 28th International Conference on Computational Linguistics

In this paper we focus on connective identification and sense classification for explicit discourse relations in German, as two individual sub-tasks of the overarching Shallow Discourse Parsing task. We successively augment a purely-empirical approach based on contextualised embeddings with linguistic knowledge encoded in a connective lexicon. In this way, we improve over published results for connective identification, achieving a final F1-score of 87.93; and we introduce, to the best of our knowledge, first results for German sense classification, achieving an F1-score of 87.13. Our approach demonstrates that a connective lexicon can be a valuable resource for those languages that do not have a large PDTB-style-annotated coprus available.

Variation in Coreference Strategies across Genres and Production Media
Berfin Aktaş | Manfred Stede
Proceedings of the 28th International Conference on Computational Linguistics

In response to (i) inconclusive results in the literature as to the properties of coreference chains in written versus spoken language, and (ii) a general lack of work on automatic coreference resolution on both spoken language and social media, we undertake a corpus study involving the various genre sections of Ontonotes, the Switchboard corpus, and a corpus of Twitter conversations. Using a set of measures that previously have been applied individually to different data sets, we find fairly clear patterns of “behavior” for the different genres/media. Besides their role for psycholinguistic investigation (why do we employ different coreference strategies when we write or speak) and for the placement of Twitter in the spoken–written continuum, we see our results as a contribution to approaching genre-/media-specific coreference resolution.

Adapting Coreference Resolution to Twitter Conversations
Berfin Aktaş | Veronika Solopova | Annalena Kohnert | Manfred Stede
Findings of the Association for Computational Linguistics: EMNLP 2020

The performance of standard coreference resolution is known to drop significantly on Twitter texts. We improve the performance of the (Lee et al., 2018) system, which is originally trained on OntoNotes, by retraining on manually-annotated Twitter conversation data. Further experiments by combining different portions of OntoNotes with Twitter data show that selecting text genres for the training data can beat the mere maximization of training data amount. In addition, we inspect several phenomena such as the role of deictic pronouns in conversational data, and present additional results for variant settings. Our best configuration improves the performance of the”out of the box” system by 21.6%.

Shallow Discourse Parsing for Under-Resourced Languages: Combining Machine Translation and Annotation Projection
Henny Sluyter-Gäthje | Peter Bourgonje | Manfred Stede
Proceedings of the Twelfth Language Resources and Evaluation Conference

Shallow Discourse Parsing (SDP), the identification of coherence relations between text spans, relies on large amounts of training data, which so far exists only for English - any other language is in this respect an under-resourced one. For those languages where machine translation from English is available with reasonable quality, MT in conjunction with annotation projection can be an option for producing an SDP resource. In our study, we translate the English Penn Discourse TreeBank into German and experiment with various methods of annotation projection to arrive at the German counterpart of the PDTB. We describe the key characteristics of the corpus as well as some typical sources of errors encountered during its creation. Then we evaluate the GermanPDTB by training components for selected sub-tasks of discourse parsing on this silver data and compare performance to the same components when trained on the gold, original PDTB corpus.

The Potsdam Commentary Corpus 2.2: Extending Annotations for Shallow Discourse Parsing
Peter Bourgonje | Manfred Stede
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present the Potsdam Commentary Corpus 2.2, a German corpus of news editorials annotated on several different levels. New in the 2.2 version of the corpus are two additional annotation layers for coherence relations following the Penn Discourse TreeBank framework. Specifically, we add relation senses to an already existing layer of discourse connectives and their arguments, and we introduce a new layer with additional coherence relation types, resulting in a German corpus that mirrors the PDTB. The aim of this is to increase usability of the corpus for the task of shallow discourse parsing. In this paper, we provide inter-annotator agreement figures for the new annotations and compare corpus statistics based on the new annotations to the equivalent statistics extracted from the PDTB.

DiMLex-Bangla: A Lexicon of Bangla Discourse Connectives
Debopam Das | Manfred Stede | Soumya Sankar Ghosh | Lahari Chatterjee
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present DiMLex-Bangla, a newly developed lexicon of discourse connectives in Bangla. The lexicon, upon completion of its first version, contains 123 Bangla connective entries, which are primarily compiled from the linguistic literature and translation of English discourse connectives. The lexicon compilation is later augmented by adding more connectives from a currently developed corpus, called the Bangla RST Discourse Treebank (Das and Stede, 2018). DiMLex-Bangla provides information on syntactic categories of Bangla connectives, their discourse semantics and non-connective uses (if any). It uses the format of the German connective lexicon DiMLex (Stede and Umbach, 1998), which provides a cross-linguistically applicable XML schema. The resource is the first of its kind in Bangla, and is freely available for use in studies on discourse structure and computational applications.

Semi-Supervised Tri-Training for Explicit Discourse Argument Expansion
René Knaebel | Manfred Stede
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper describes a novel application of semi-supervision for shallow discourse parsing. We use a neural approach for sequence tagging and focus on the extraction of explicit discourse arguments. First, additional unlabeled data is prepared for semi-supervised learning. From this data, weak annotations are generated in a first setting and later used in another setting to study performance differences. In our studies, we show an increase in the performance of our models that ranges between 2-10% F1 score. Further, we give some insights to the generated discourse annotations and compare the developed additional relations with the training relations. We release this new dataset of explicit discourse arguments to enable the training of large statistical models.

2019

Mining Italian Short Argumentative Texts
Ivan Namor | Pietro Totis | Samuele Garda | Manfred Stede
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

Window-Based Neural Tagging for Shallow Discourse Argument Labeling
René Knaebel | Manfred Stede | Sebastian Stober
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

This paper describes a novel approach for the task of end-to-end argument labeling in shallow discourse parsing. Our method describes a decomposition of the overall labeling task into subtasks and a general distance-based aggregation procedure. For learning these subtasks, we train a recurrent neural network and gradually replace existing components of our baseline by our model. The model is trained and evaluated on the Penn Discourse Treebank 2 corpus. While it is not as good as knowledge-intense approaches, it clearly outperforms other models that are also trained without additional linguistic features.

Automated Cross-language Intelligibility Analysis of Parkinson’s Disease Patients Using Speech Recognition Technologies
Nina Hosseini-Kivanani | Juan Camilo Vásquez-Correa | Manfred Stede | Elmar Nöth
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Speech deficits are common symptoms amongParkinson’s Disease (PD) patients. The automatic assessment of speech signals is promising for the evaluation of the neurological state and the speech quality of the patients. Recently, progress has been made in applying machine learning and computational methods to automatically evaluate the speech of PD patients. In the present study, we plan to analyze the speech signals of PD patients and healthy control (HC) subjects in three different languages: German, Spanish, and Czech, with the aim to identify biomarkers to discriminate between PD patients and HC subjects and to evaluate the neurological state of the patients. Therefore, the main contribution of this study is the automatic classification of PD patients and HC subjects in different languages with focusing on phonation, articulation, and prosody. We will focus on an intelligibility analysis based on automatic speech recognition systems trained on these three languages. This is one of the first studies done that considers the evaluation of the speech of PD patients in different languages. The purpose of this research proposal is to build a model that can discriminate PD and HC subjects even when the language used for train and test is different.

Annotating Shallow Discourse Relations in Twitter Conversations
Tatjana Scheffler | Berfin Aktaş | Debopam Das | Manfred Stede
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

We introduce our pilot study applying PDTB-style annotation to Twitter conversations. Lexically grounded coherence annotation for Twitter threads will enable detailed investigations of the discourse structure of conversations on social media. Here, we present our corpus of 185 threads and annotation, including an inter-annotator agreement study. We discuss our observations as to how Twitter discourses differ from written news text wrt. discourse connectives and relations. We confirm our hypothesis that discourse relations in written social media conversations are expressed differently than in (news) text. We find that in Twitter, connective arguments frequently are not full syntactic clauses, and that a few general connectives expressing EXPANSION and CONTINGENCY make up the majority of the explicit relations in our data.

RST-Tace A tool for automatic comparison and evaluation of RST trees
Shujun Wan | Tino Kutschbach | Anke Lüdeling | Manfred Stede
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

This paper presents RST-Tace, a tool for automatic comparison and evaluation of RST trees. RST-Tace serves as an implementation of Iruskieta’s comparison method, which allows trees to be compared and evaluated without the influence of decisions at lower levels in a tree in terms of four factors: constituent, attachment point, nuclearity as well as relation. RST-Tace can be used regardless of the language or the size of rhetorical trees. This tool aims to measure the agreement between two annotators. The result is reflected by F-measure and inter-annotator agreement. Both the comparison table and the result of the evaluation can be obtained automatically.

Coherence models in schizophrenia
Sandra Just | Erik Haegert | Nora Kořánová | Anna-Lena Bröcker | Ivan Nenchev | Jakob Funcke | Christiane Montag | Manfred Stede
Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology

Incoherent discourse in schizophrenia has long been recognized as a dominant symptom of the mental disorder (Bleuler, 1911/1950). Recent studies have used modern sentence and word embeddings to compute coherence metrics for spontaneous speech in schizophrenia. While clinical ratings always have a subjective element, computational linguistic methodology allows quantification of speech abnormalities. Clinical and empirical knowledge from psychiatry provide the theoretical and conceptual basis for modelling. Our study is an interdisciplinary attempt at improving coherence models in schizophrenia. Speech samples were obtained from healthy controls and patients with a diagnosis of schizophrenia or schizoaffective disorder and different severity of positive formal thought disorder. Interviews were transcribed and coherence metrics derived from different embeddings. One model found higher coherence metrics for controls than patients. All other models remained non-significant. More detailed analysis of the data motivates different approaches to improving coherence models in schizophrenia, e.g. by assessing referential abnormalities.

The Utility of Discourse Parsing Features for Predicting Argumentation Structure
Freya Hewett | Roshan Prakash Rane | Nina Harlacher | Manfred Stede
Proceedings of the 6th Workshop on Argument Mining

Research on argumentation mining from text has frequently discussed relationships to discourse parsing, but few empirical results are available so far. One corpus that has been annotated in parallel for argumentation structure and for discourse structure (RST, SDRT) are the ‘argumentative microtexts’ (Peldszus and Stede, 2016a). While results on perusing the gold RST annotations for predicting argumentation have been published (Peldszus and Stede, 2016b), the step to automatic discourse parsing has not yet been taken. In this paper, we run various discourse parsers (RST, PDTB) on the corpus, compare their results to the gold annotations (for RST) and then assess the contribution of automatically-derived discourse features for argumentation parsing. After reproducing the state-of-the-art Evidence Graph model from Afantenos et al. (2018) for the microtexts, we find that PDTB features can indeed improve its performance.

Computational Argumentation Synthesis as a Language Modeling Task
Roxanne El Baff | Henning Wachsmuth | Khalid Al Khatib | Manfred Stede | Benno Stein
Proceedings of the 12th International Conference on Natural Language Generation

Synthesis approaches in computational argumentation so far are restricted to generating claim-like argument units or short summaries of debates. Ultimately, however, we expect computers to generate whole new arguments for a given stance towards some topic, backing up claims following argumentative and rhetorical considerations. In this paper, we approach such an argumentation synthesis as a language modeling task. In our language model, argumentative discourse units are the “words”, and arguments represent the “sentences”. Given a pool of units for any unseen topic-stance pair, the model selects a set of unit types according to a basic rhetorical strategy (logos vs. pathos), arranges the structure of the types based on the units’ argumentative roles, and finally “phrases” an argument by instantiating the structure with semantically coherent units from the pool. Our evaluation suggests that the model can, to some extent, mimic the human synthesis of strategy-specific arguments.

2018

Classifying Italian newspaper text: news or editorial?
Pietro Totis | Manfred Stede
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

Argumentation Synthesis following Rhetorical Strategies
Henning Wachsmuth | Manfred Stede | Roxanne El Baff | Khalid Al-Khatib | Maria Skeppstedt | Benno Stein
Proceedings of the 27th International Conference on Computational Linguistics

Persuasion is rarely achieved through a loose set of arguments alone. Rather, an effective delivery of arguments follows a rhetorical strategy, combining logical reasoning with appeals to ethics and emotion. We argue that such a strategy means to select, arrange, and phrase a set of argumentative discourse units. In this paper, we model rhetorical strategies for the computational synthesis of effective argumentation. In a study, we let 26 experts synthesize argumentative texts with different strategies for 10 topics. We find that the experts agree in the selection significantly more when following the same strategy. While the texts notably vary for different strategies, especially their arrangement remains stable. The results suggest that our model enables a strategical synthesis.

A Multi-layer Annotated Corpus of Argumentative Text: From Argument Schemes to Discourse Relations
Elena Musi | Tariq Alhindi | Manfred Stede | Leonard Kriese | Smaranda Muresan | Andrea Rocci
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Developing the Bangla RST Discourse Treebank
Debopam Das | Manfred Stede
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

A Lexicon of Discourse Markers for Portuguese – LDM-PT
Amália Mendes | Iria del Rio | Manfred Stede | Felix Dombek
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Anaphora Resolution for Twitter Conversations: An Exploratory Study
Berfin Aktaş | Tatjana Scheffler | Manfred Stede
Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference

We present a corpus study of pronominal anaphora on Twitter conversations. After outlining the specific features of this genre, with respect to reference resolution, we explain the construction of our corpus and the annotation steps. From this we derive a list of phenomena that need to be considered when performing anaphora resolution on this type of data. Finally, we test the performance of an off-the-shelf resolution system, and provide some qualitative error analysis.

Identifying Explicit Discourse Connectives in German
Peter Bourgonje | Manfred Stede
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

We are working on an end-to-end Shallow Discourse Parsing system for German and in this paper focus on the first subtask: the identification of explicit connectives. Starting with the feature set from an English system and a Random Forest classifier, we evaluate our approach on a (relatively small) German annotated corpus, the Potsdam Commentary Corpus. We introduce new features and experiment with including additional training data obtained through annotation projection and achieve an f-score of 83.89.

Constructing a Lexicon of English Discourse Connectives
Debopam Das | Tatjana Scheffler | Peter Bourgonje | Manfred Stede
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

We present a new lexicon of English discourse connectives called DiMLex-Eng, built by merging information from two annotated corpora and an additional list of relation signals from the literature. The format follows the German connective lexicon DiMLex, which provides a cross-linguistically applicable XML schema. DiMLex-Eng contains 149 English connectives, and gives information on syntactic categories, discourse semantics and non-connective uses (if any). We report on the development steps and discuss design decisions encountered in the lexicon expansion phase. The resource is freely available for use in studies of discourse structure and computational applications.

More or less controlled elicitation of argumentative text: Enlarging a microtext corpus via crowdsourcing
Maria Skeppstedt | Andreas Peldszus | Manfred Stede
Proceedings of the 5th Workshop on Argument Mining

We present an extension of an annotated corpus of short argumentative texts that had originally been built in a controlled text production experiment. Our extension more than doubles the size of the corpus by means of crowdsourcing. We report on the setup of this experiment and on the consequences that crowdsourcing had for assembling the data, and in particular for annotation. We labeled the argumentative structure by marking claims, premises, and relations between them, following the scheme used in the original corpus, but had to make a few modifications in response to interesting phenomena in the data. Finally, we report on an experiment with the automatic prediction of this argumentation structure: We first replicated the approach of an earlier study on the original corpus, and compare the performance to various settings involving the extension.

Stance-Taking in Topics Extracted from Vaccine-Related Tweets and Discussion Forum Posts
Maria Skeppstedt | Manfred Stede | Andreas Kerren
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task

The occurrence of stance-taking towards vaccination was measured in documents extracted by topic modelling from two different corpora, one discussion forum corpus and one tweet corpus. For some of the topics extracted, their most closely associated documents contained a proportion of vaccine stance-taking texts that exceeded the corpus average by a large margin. These extracted document sets would, therefore, form a useful resource in a process for computer-assisted analysis of argumentation on the subject of vaccination.

2017

Toward a Bilingual Lexical Database on Connectives: Exploiting a German/Italian Parallel Corpus
Peter Bourgonje | Yulia Grishina | Manfred Stede
Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)

Multi-source annotation projection of coreference chains: assessing strategies and testing opportunities
Yulia Grishina | Manfred Stede
Proceedings of the 2nd Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2017)

In this paper, we examine the possibility of using annotation projection from multiple sources for automatically obtaining coreference annotations in the target language. We implement a multi-source annotation projection algorithm and apply it on an English-German-Russian parallel corpus in order to transfer coreference chains from two sources to the target side. Operating in two settings – a low-resource and a more linguistically-informed one – we show that automatic coreference transfer could benefit from combining information from multiple languages, and assess the quality of both the extraction and the linking of target coreference mentions.

The Good, the Bad, and the Disagreement: Complex ground truth in rhetorical structure analysis
Debopam Das | Manfred Stede | Maite Taboada
Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms

Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue
Kristiina Jokinen | Manfred Stede | David DeVault | Annie Louis
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Automatic detection of stance towards vaccination in online discussion forums
Maria Skeppstedt | Andreas Kerren | Manfred Stede
Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017)

A classifier for automatic detection of stance towards vaccination in online forums was trained and evaluated. Debate posts from six discussion threads on the British parental website Mumsnet were manually annotated for stance ‘against’ or ‘for’ vaccination, or as ‘undecided’. A support vector machine, trained to detect the three classes, achieved a macro F-score of 0.44, while a macro F-score of 0.62 was obtained by the same type of classifier on the binary classification task of distinguishing stance ‘against’ vaccination from stance ‘for’ vaccination. These results show that vaccine stance detection in online forums is a difficult task, at least for the type of model investigated and for the relatively small training corpus that was used. Future work will therefore include an expansion of the training data and an evaluation of other types of classifiers and features.

Extracting word lists for domain-specific implicit opinions from corpora
Núria Bertomeu Castelló | Manfred Stede
Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Long papers

2016

Towards assessing depth of argumentation
Manfred Stede
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

For analyzing argumentative text, we propose to study the ‘depth’ of argumentation as one important component, which we distinguish from argument quality. In a pilot study with German newspaper commentary texts, we asked students to rate the degree of argumentativeness, and then looked for correlations with features of the annotated argumentation structure and the rhetorical structure (in terms of RST). The results indicate that the human judgements correlate with our operationalization of depth and with certain structural features of RST trees.

OPT: Oslo–Potsdam–Teesside. Pipelining Rules, Rankers, and Classifier Ensembles for Shallow Discourse Parsing
Stephan Oepen | Jonathon Read | Tatjana Scheffler | Uladzimir Sidarenka | Manfred Stede | Erik Velldal | Lilja Øvrelid
Proceedings of the CoNLL-16 shared task

Adding Semantic Relations to a Large-Coverage Connective Lexicon of German
Tatjana Scheffler | Manfred Stede
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

DiMLex is a lexicon of German connectives that can be used for various language understanding purposes. We enhanced the coverage to 275 connectives, which we regard as covering all known German discourse connectives in current use. In this paper, we consider the task of adding the semantic relations that can be expressed by each connective. After discussing different approaches to retrieving semantic information, we settle on annotating each connective with senses from the new PDTB 3.0 sense hierarchy. We describe our new implementation in the extended DiMLex, which will be available for research purposes.

Parallel Discourse Annotations on a Corpus of Short Texts
Manfred Stede | Stergos Afantenos | Andreas Peldszus | Nicholas Asher | Jérémy Perret
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present the first corpus of texts annotated with two alternative approaches to discourse structure, Rhetorical Structure Theory (Mann and Thompson, 1988) and Segmented Discourse Representation Theory (Asher and Lascarides, 2003). 112 short argumentative texts have been analyzed according to these two theories. Furthermore, in previous work, the same texts have already been annotated for their argumentation structure, according to the scheme of Peldszus and Stede (2013). This corpus therefore enables studies of correlations between the two accounts of discourse structure, and between discourse and argumentation. We converted the three annotation formats to a common dependency tree format that enables to compare the structures, and we describe some initial findings.

Information structure in the Potsdam Commentary Corpus: Topics
Manfred Stede | Sara Mamprin
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The Potsdam Commentary Corpus is a collection of 175 German newspaper commentaries annotated on a variety of different layers. This paper introduces a new layer that covers the linguistic notion of information-structural topic (not to be confused with ‘topic’ as applied to documents in information retrieval). To our knowledge, this is the first larger topic-annotated resource for German (and one of the first for any language). We describe the annotation guidelines and the annotation process, and the results of an inter-annotator agreement study, which compare favourably to the related work. The annotated corpus is freely available for research.

Anaphoricity in Connectives: A Case Study on German
Manfred Stede | Yulia Grishina
Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016)

Rhetorical structure and argumentation structure in monologue text
Andreas Peldszus | Manfred Stede
Proceedings of the Third Workshop on Argument Mining (ArgMining2016)

Generating Sentiment Lexicons for German Twitter
Uladzimir Sidarenka | Manfred Stede
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)

Despite a substantial progress made in developing new sentiment lexicon generation (SLG) methods for English, the task of transferring these approaches to other languages and domains in a sound way still remains open. In this paper, we contribute to the solution of this problem by systematically comparing semi-automatic translations of common English polarity lists with the results of the original automatic SLG algorithms, which were applied directly to German data. We evaluate these lexicons on a corpus of 7,992 manually annotated tweets. In addition to that, we also collate the results of dictionary- and corpus-based SLG methods in order to find out which of these paradigms is better suited for the inherently noisy domain of social media. Our experiments show that semi-automatic translations notably outperform automatic systems (reaching a macro-averaged F1-score of 0.589), and that dictionary-based techniques produce much better polarity lists as compared to corpus-based approaches (whose best F1-scores run up to 0.479 and 0.419 respectively) even for the non-standard Twitter genre.

2015

Joint prediction in MST-style discourse parsing for argumentation mining
Andreas Peldszus | Manfred Stede
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

Towards Detecting Counter-considerations in Text
Andreas Peldszus | Manfred Stede
Proceedings of the 2nd Workshop on Argumentation Mining

Knowledge-lean projection of coreference chains across languages
Yulia Grishina | Manfred Stede
Proceedings of the Eighth Workshop on Building and Using Comparable Corpora

2014

Potsdam Commentary Corpus 2.0: Annotation for Discourse Research
Manfred Stede | Arne Neumann
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present a revised and extended version of the Potsdam Commentary Corpus, a collection of 175 German newspaper commentaries (op-ed pieces) that has been annotated with syntax trees and three layers of discourse-level information: nominal coreference,connectives and their arguments (similar to the PDTB, Prasad et al. 2008), and trees reflecting discourse structure according to Rhetorical Structure Theory (Mann/Thompson 1988). Connectives have been annotated with the help of a semi-automatic tool, Conano (Stede/Heintze 2004), which identifies most connectives and suggests arguments based on their syntactic category. The other layers have been created manually with dedicated annotation tools. The corpus is made available on the one hand as a set of original XML files produced with the annotation tools, based on identical tokenization. On the other hand, it is distributed together with the open-source linguistic database ANNIS3 (Chiarcos et al. 2008; Zeldes et al. 2009), which provides multi-layer search functionality and layer-specific visualization modules. This allows for comfortable qualitative evaluation of the correlations between annotation layers.

A Model for Processing Illocutionary Structures and Argumentation in Debates
Kasia Budzynska | Mathilde Janier | Chris Reed | Patrick Saint-Dizier | Manfred Stede | Olena Yakorska
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper, we briefly present the objectives of Inference Anchoring Theory (IAT) and the formal structure which is proposed for dialogues. Then, we introduce our development corpus, and a computational model designed for the identification of discourse minimal units in the context of argumentation and the illocutionary force associated with each unit. We show the categories of resources which are needed and how they can be reused in different contexts.

GraPAT: a Tool for Graph Annotations
Jonathan Sonntag | Manfred Stede
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We introduce GraPAT, a web-based annotation tool for building graph structures over text. Graphs have been demonstrated to be relevant in a variety of quite diverse annotation efforts and in different NLP applications, and they serve to model annotators intuitions quite closely. In particular, in this paper we discuss the implementation of graph annotations for sentiment analysis, argumentation structure, and rhetorical text structures. All of these scenarios can create certain problems for existing annotation tools, and we show how GraPAT can help to overcome such difficulties.

Conceptual and Practical Steps in Event Coreference Analysis of Large-scale Data
Fatemeh Torabi Asr | Jonathan Sonntag | Yulia Grishina | Manfred Stede
Proceedings of the Second Workshop on EVENTS: Definition, Detection, Coreference, and Representation

Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop
Lori Levin | Manfred Stede
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop

2013

Discourse Processing
Manfred Stede
NAACL HLT 2013 Tutorial Abstracts

From newspaper to microblogging: What does it take to find opinions?
Wladimir Sidorenko | Jonathan Sonntag | Nina Krüger | Stefan Stieglitz | Manfred Stede
Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Importing MASC into the ANNIS linguistic database: A case study of mapping GrAF
Arne Neumann | Nancy Ide | Manfred Stede
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

Ranking the annotators: An agreement study on argumentation structure
Andreas Peldszus | Manfred Stede
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

Towards a Tool for Interactive Concept Building for Large Scale Analysis in the Humanities
Andre Blessing | Jonathan Sonntag | Fritz Kliche | Ulrich Heid | Jonas Kuhn | Manfred Stede
Proceedings of the 7th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

2012

SemScribe: Natural Language Generation for Medical Reports
Sebastian Varges | Heike Bieler | Manfred Stede | Lukas C. Faulstich | Kristin Irsig | Malik Atalla
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Natural language generation in the medical domain is heavily influenced by domain knowledge and genre-specific text characteristics. We present SemScribe, an implemented natural language generation system that produces doctor's letters, in particular descriptions of cardiological findings. Texts in this domain are characterized by a high density of information and a relatively telegraphic style. Domain knowledge is encoded in a medical ontology of about 80,000 concepts. The ontology is used in particular for concept generalizations during referring expression generation. Architecturally, the system is a generation pipeline that uses a corpus-informed syntactic frame approach for realizing sentences appropriate to the domain. The system reads XML documents conforming to the HL7 Clinical Document Architecture (CDA) Standard and enhances them with generated text and references to the used data elements. We conducted a first clinical trial evaluation with medical staff and report on the findings.

2011

Lexicon-Based Methods for Sentiment Analysis
Maite Taboada | Julian Brooke | Milan Tofiloski | Kimberly Voll | Manfred Stede
Computational Linguistics, Volume 37, Issue 2 - June 2011

2009

Proceedings of the Third Linguistic Annotation Workshop (LAW III)
Manfred Stede | Chu-Ren Huang | Nancy Ide | Adam Meyers
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

By all these lovely tokens... Merging Conflicting Tokenizations
Christian Chiarcos | Julia Ritz | Manfred Stede
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

Genre-Based Paragraph Classification for Sentiment Analysis
Maite Taboada | Julian Brooke | Manfred Stede
Proceedings of the SIGDIAL 2009 Conference

2008

A Flexible Framework for Integrating Annotations from Different Tools and Tag Sets
Christian Chiarcos | Stefanie Dipper | Michael Götze | Ulf Leser | Anke Lüdeling | Julia Ritz | Manfred Stede
Traitement Automatique des Langues, Volume 49, Numéro 2 : Plate-formes pour le traitement automatique des langues [Platforms for Natural Language Processing]

Connective-based Local Coherence Analysis: A Lexicon for Recognizing Causal Relationships
Manfred Stede
Semantics in Text Processing. STEP 2008 Conference Proceedings

2007

Identifying Formal and Functional Zones in Film Reviews
Heike Bieler | Stefanie Dipper | Manfred Stede
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue

Proceedings of the Linguistic Annotation Workshop
Branimir Boguraev | Nancy Ide | Adam Meyers | Shigeko Nariyama | Manfred Stede | Janyce Wiebe | Graham Wilcock
Proceedings of the Linguistic Annotation Workshop

Panel Session: Discourse Annotation
Manfred Stede | Janyce Wiebe | Eva Hajičová | Brian Reese | Simone Teufel | Bonnie Webber | Theresa Wilson
Proceedings of the Linguistic Annotation Workshop

2004

Machine-Assisted Rhetorical Structure Annotation
Manfred Stede | Silvan Heintze
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

The Potsdam Commentary Corpus
Manfred Stede
Proceedings of the Workshop on Discourse Annotation

Feeding OWL: Extracting and Representing the Content of Pathology Reports
David Schlangen | Manfred Stede | Elena Paslaru Bontas
Proceeedings of the Workshop on NLP and XML (NLPXML-2004): RDF/RDFS and OWL in Language Technology

2003

Rhetorical Parsing with Underspecification and Forests
Thomas Hanneforth | Silvan Heintze | Manfred Stede
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers

Surfaces and depths in text understanding: The case of newspaper commentary
Manfred Stede
Proceedings of the HLT-NAACL 2003 Workshop on Text Meaning

Step by step: underspecified markup in incremental rhetorical analysis
David Reitter | Manfred Stede
Proceedings of 4th International Workshop on Linguistically Interpreted Corpora (LINC-03) at EACL 2003

2002

Polibox: Generating Descriptions, Comparisons, and Recommendations from a Database
Manfred Stede
COLING 2002: The 17th International Conference on Computational Linguistics: Project Notes

XML/XSL in the Dictionary: The Case of Discourse Markers
Daniela Berger | David Reitter | Manfred Stede
COLING-02: The 2nd Workshop on NLP and XML (NLPXML-2002)

2000

Book Reviews: Predicative Forms in Natural Language and in Lexical Knowledge Bases
Manfred Stede
Computational Linguistics, Volume 26, Number 2, June 2000

The hyperonym problem revisited: Conceptual and lexical hierarchies in language generation
Manfred Stede
INLG’2000 Proceedings of the First International Conference on Natural Language Generation

1998

DiMLex: A lexicon of discourse markers for text generation and understanding
Manfred Stede | Carla Umbach
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

A Generative Perspective on Verb Alternations
Manfred Stede
Computational-Linguistics, Volume 24, Number 3, September 1998

DiMLex: A Lexicon of Discourse Markers for Text Generation and Understanding
Manfred Stede | Carla Umbach
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

Discourse Marker Choice in Sentence Planning
Brigitte Grote | Manfred Stede
Natural Language Generation

1997

Discourse particles and routine formulas in spoken language translation
Manfred Stede | Birte Schmitz
Spoken Language Translation

1996

A generative perspective on verbs and their readings
Manfred Stede
Eighth International Natural Language Generation Workshop

1994

TECHDOC: Multilingual generation of online and offline instructional text
Dietmar Rosner | Manfred Stede
Fourth Conference on Applied Natural Language Processing

Generating Multilingual Documents from a Knowledge Base The TECHDOC Project
Dietmar Rosner | Manfred Stede
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics

1993

Lexical Choice Criteria in Language Generation
Manfred Stede
Sixth Conference of the European Chapter of the Association for Computational Linguistics

Co-authors

Yulia Grishina 5

Maria Skeppstedt 5

Berfin Aktaş 4

Jonathan Sonntag 4

Khalid Al Khatib 3

Francesca Grasso 3

Sara Shahmohammadi 3

Maite Taboada 3

Karolina Zaczynska 3

Mohammad Yeghaneh Abkenar 2

Julian Brooke 2

Christian Chiarcos 2

Stefanie Dipper 2

Roxanne El Baff 2

Silvan Heintze 2

Andreas Kerren 2

Anke Lüdeling 2

David Reitter 2

Dietmar Rösner 2

Uladzimir Sidarenka 2

Henning Wachsmuth 2

Christoph Abels 1

Stergos Afantenos 1

Tariq Alhindi 1

Nicholas Asher 1

Valerio Basile 1

Daniela Berger 1

Núria Bertomeu 1

André Blessing 1

Branimir Boguraev 1

Cristina Bosco 1

Anna-Lena Bröcker 1

Kasia Budzynska 1

Lahari Chatterjee 1

David DeVault 1

Iria Del Río Gayo 1

Lukas C. Faulstich 1

Steffen Frenzel 1

Samuele Garda 1

Soumya Sankar Ghosh 1

Hendrik Graupner 1

Brigitte Grote 1

Michael Götze 1

Thomas Hanneforth 1

Nina Harlacher 1

Sophie Henning 1

Nina Hosseini-Kivanani 1

Chu-Ren Huang 1

Muhammad Okky Ibrohim 1

Kristin Irsig 1

Mathilde Janier 1

Kristiina Jokinen 1

Annalena Kohnert 1

Nora Kořánová 1

Leonard Kriese 1

Tino Kutschbach 1

Stephan Lewandowsky 1

Stefano Locci 1

Ronja Memminger 1

Amália Mendes 1

Christiane Montag 1

Smaranda Muresan 1

Shigeko Nariyama 1

Stephan Oepen 1

Elena Paslaru Bontas 1

Jérémy Perret 1

Antonio Pires 1

Maria Poiaganova 1

Roshan Prakash Rane 1

Jonathon Read 1

Stian Rødven-Eide 1

Patrick Saint-Dizier 1

David Schlangen 1

Birte Schmitz 1

Hannah Seemann 1

Hannah J. Seemann 1

Wladimir Sidorenko 1

Henny Sluyter-Gäthje 1

Veronika Solopova 1

Hannah Mathilde Steinbach 1

Stefan Stieglitz 1

Sebastian Stober 1

Simone Teufel 1

Milan Tofiloski 1

Fatemeh Torabi Asr 1

Sebastian Varges 1

Kimberly Voll 1

Juan Camilo Vásquez-Correa 1

Bonnie Webber 1

Graham Wilcock 1

Theresa Wilson 1

Olena Yakorska 1

Imge Yüzüncüoglu 1

Torsten Zesch 1

Lilja Øvrelid 1

Venues