Henning Wachsmuth


2021

pdf bib
Belief-based Generation of Argumentative Claims
Milad Alshomary | Wei-Fan Chen | Timon Gurcke | Henning Wachsmuth
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

In this work, we argue that augmenting argument generation technology with the ability to encode beliefs is of twofold. First, it gives more control on the generated arguments leading to better reach for audience. Second, it is one way of modeling the human process of synthesizing arguments. Therefore, we propose the task of belief-based claim generation, and study the research question of how to model and encode a user’s beliefs into a generated argumentative text. To this end, we model users’ beliefs via their stances on big issues, and extend state of the art text generation models with extra input reflecting user’s beliefs. Through an automatic evaluation we show empirical evidence of the applicability to encode beliefs into argumentative text. In our manual evaluation, we highlight that the low effectiveness of our approach stems from the noise produced by the automatic collection of bag-of-words, which was mitigated by removing this noise. The finding of this paper lays the ground work to further investigate the role of beliefs in generating better reaching arguments.

pdf bib
Learning From Revisions: Quality Assessment of Claims in Argumentation at Scale
Gabriella Skitalinskaya | Jonas Klaff | Henning Wachsmuth
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Assessing the quality of arguments and of the claims the arguments are composed of has become a key task in computational argumentation. However, even if different claims share the same stance on the same topic, their assessment depends on the prior perception and weighting of the different aspects of the topic being discussed. This renders it difficult to learn topic-independent quality indicators. In this paper, we study claim quality assessment irrespective of discussed aspects by comparing different revisions of the same claim. We compile a large-scale corpus with over 377k claim revision pairs of various types from kialo.com, covering diverse topics from politics, ethics, entertainment, and others. We then propose two tasks: (a) assessing which claim of a revision pair is better, and (b) ranking all versions of a claim by quality. Our first experiments with embedding-based logistic regression and transformer-based neural networks show promising results, suggesting that learned indicators generalize well across topics. In a detailed error analysis, we give insights into what quality dimensions of claims can be assessed reliably. We provide the data and scripts needed to reproduce all results.

pdf bib
Syntopical Graphs for Computational Argumentation Tasks
Joe Barrow | Rajiv Jain | Nedim Lipka | Franck Dernoncourt | Vlad Morariu | Varun Manjunatha | Douglas Oard | Philip Resnik | Henning Wachsmuth
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Approaches to computational argumentation tasks such as stance detection and aspect detection have largely focused on the text of independent claims, losing out on potentially valuable context provided by the rest of the collection. We introduce a general approach to these tasks motivated by syntopical reading, a reading process that emphasizes comparing and contrasting viewpoints in order to improve topic understanding. To capture collection-level context, we introduce the syntopical graph, a data structure for linking claims within a collection. A syntopical graph is a typed multi-graph where nodes represent claims and edges represent different possible pairwise relationships, such as entailment, paraphrase, or support. Experiments applying syntopical graphs to the problems of detecting stance and aspects demonstrate state-of-the-art performance in each domain, significantly outperforming approaches that do not utilize collection-level information.

pdf bib
Employing Argumentation Knowledge Graphs for Neural Argument Generation
Khalid Al Khatib | Lukas Trautner | Henning Wachsmuth | Yufang Hou | Benno Stein
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Generating high-quality arguments, while being challenging, may benefit a wide range of downstream applications, such as writing assistants and argument search engines. Motivated by the effectiveness of utilizing knowledge graphs for supporting general text generation tasks, this paper investigates the usage of argumentation-related knowledge graphs to control the generation of arguments. In particular, we construct and populate three knowledge graphs, employing several compositions of them to encode various knowledge into texts of debate portals and relevant paragraphs from Wikipedia. Then, the texts with the encoded knowledge are used to fine-tune a pre-trained text generation model, GPT-2. We evaluate the newly created arguments manually and automatically, based on several dimensions important in argumentative contexts, including argumentativeness and plausibility. The results demonstrate the positive impact of encoding the graphs’ knowledge into debate portal texts for generating arguments with superior quality than those generated without knowledge.

pdf bib
Counter-Argument Generation by Attacking Weak Premises
Milad Alshomary | Shahbaz Syed | Arkajit Dhar | Martin Potthast | Henning Wachsmuth
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Generating Informative Conclusions for Argumentative Texts
Shahbaz Syed | Khalid Al Khatib | Milad Alshomary | Henning Wachsmuth | Martin Potthast
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
Mining Crowdsourcing Problems from Discussion Forums of Workers
Zahra Nouri | Henning Wachsmuth | Gregor Engels
Proceedings of the 28th International Conference on Computational Linguistics

Crowdsourcing is used in academia and industry to solve tasks that are easy for humans but hard for computers, in natural language processing mostly to annotate data. The quality of annotations is affected by problems in the task design, task operation, and task evaluation that workers face with requesters in crowdsourcing processes. To learn about the major problems, we provide a short but comprehensive survey based on two complementary studies: (1) a literature review where we collect and organize problems known from interviews with workers, and (2) an empirical data analysis where we use topic modeling to mine workers’ complaints from a new English corpus of workers’ forum discussions. While literature covers all process phases, problems in the task evaluation are prevalent, including unfair rejections, late payments, and unjustified blockings of workers. According to the data, however, poor task design in terms of malfunctioning environments, bad workload estimation, and privacy violations seems to bother the workers most. Our findings form the basis for future research on how to improve crowdsourcing processes.

pdf bib
Intrinsic Quality Assessment of Arguments
Henning Wachsmuth | Till Werner
Proceedings of the 28th International Conference on Computational Linguistics

Several quality dimensions of natural language arguments have been investigated. Some are likely to be reflected in linguistic features (e.g., an argument’s arrangement), whereas others depend on context (e.g., relevance) or topic knowledge (e.g., acceptability). In this paper, we study the intrinsic computational assessment of 15 dimensions, i.e., only learning from an argument’s text. In systematic experiments with eight feature types on an existing corpus, we observe moderate but significant learning success for most dimensions. Rhetorical quality seems hardest to assess, and subjectivity features turn out strong, although length bias in the corpus impedes full validity. We also find that human assessors differ more clearly to each other than to our approach.

pdf bib
Semi-Supervised Cleansing of Web Argument Corpora
Jonas Dorsch | Henning Wachsmuth
Proceedings of the 7th Workshop on Argument Mining

Debate portals and similar web platforms constitute one of the main text sources in computational argumentation research and its applications. While the corpora built upon these sources are rich of argumentatively relevant content and structure, they also include text that is irrelevant, or even detrimental, to their purpose. In this paper, we present a precision-oriented approach to detecting such irrelevant text in a semi-supervised way. Given a few seed examples, the approach automatically learns basic lexical patterns of relevance and irrelevance and then incrementally bootstraps new patterns from sentences matching the patterns. In the existing args.me corpus with 400k argumentative texts, our approach detects almost 87k irrelevant sentences, at a precision of 0.97 according to manual evaluation. With low effort, the approach can be adapted to other web argument corpora, providing a generic way to improve corpus quality.

pdf bib
Argument from Old Man’s View: Assessing Social Bias in Argumentation
Maximilian Spliethöver | Henning Wachsmuth
Proceedings of the 7th Workshop on Argument Mining

Social bias in language - towards genders, ethnicities, ages, and other social groups - poses a problem with ethical impact for many NLP applications. Recent research has shown that machine learning models trained on respective data may not only adopt, but even amplify the bias. So far, however, little attention has been paid to bias in computational argumentation. In this paper, we study the existence of social biases in large English debate portals. In particular, we train word embedding models on portal-specific corpora and systematically evaluate their bias using WEAT, an existing metric to measure bias in word embeddings. In a word co-occurrence analysis, we then investigate causes of bias. The results suggest that all tested debate corpora contain unbalanced and biased data, mostly in favor of male people with European-American names. Our empirical insights contribute towards an understanding of bias in argumentative data sources.

pdf bib
Detecting Media Bias in News Articles using Gaussian Bias Distributions
Wei-Fan Chen | Khalid Al Khatib | Benno Stein | Henning Wachsmuth
Findings of the Association for Computational Linguistics: EMNLP 2020

Media plays an important role in shaping public opinion. Biased media can influence people in undesirable directions and hence should be unmasked as such. We observe that feature-based and neural text classification approaches which rely only on the distribution of low-level lexical information fail to detect media bias. This weakness becomes most noticeable for articles on new events, where words appear in new contexts and hence their “bias predictiveness” is unclear. In this paper, we therefore study how second-order information about biased statements in an article helps to improve detection effectiveness. In particular, we utilize the probability distributions of the frequency, positions, and sequential order of lexical and informational sentence-level bias in a Gaussian Mixture Model. On an existing media bias dataset, we find that the frequency and positions of biased statements strongly impact article-level bias, whereas their exact sequential order is secondary. Using a standard model for sentence-level bias detection, we provide empirical evidence that article-level bias detectors that use second-order information clearly outperform those without.

pdf bib
Task Proposal: Abstractive Snippet Generation for Web Pages
Shahbaz Syed | Wei-Fan Chen | Matthias Hagen | Benno Stein | Henning Wachsmuth | Martin Potthast
Proceedings of the 13th International Conference on Natural Language Generation

We propose a shared task on abstractive snippet generation for web pages, a novel task of generating query-biased abstractive summaries for documents that are to be shown on a search results page. Conventional snippets are extractive in nature, which recently gave rise to copyright claims from news publishers as well as a new copyright legislation being passed in the European Union, limiting the fair use of web page contents for snippets. At the same time, abstractive summarization has matured considerably in recent years, potentially allowing for more personalization of snippets in the future. Taken together, these facts render further research into generating abstractive snippets both timely and promising.

pdf bib
Analyzing the Persuasive Effect of Style in News Editorial Argumentation
Roxanne El Baff | Henning Wachsmuth | Khalid Al Khatib | Benno Stein
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

News editorials argue about political issues in order to challenge or reinforce the stance of readers with different ideologies. Previous research has investigated such persuasive effects for argumentative content. In contrast, this paper studies how important the style of news editorials is to achieve persuasion. To this end, we first compare content- and style-oriented classifiers on editorials from the liberal NYTimes with ideology-specific effect annotations. We find that conservative readers are resistant to NYTimes style, but on liberals, style even has more impact than content. Focusing on liberals, we then cluster the leads, bodies, and endings of editorials, in order to learn about writing style patterns of effective argumentation.

pdf bib
Target Inference in Argument Conclusion Generation
Milad Alshomary | Shahbaz Syed | Martin Potthast | Henning Wachsmuth
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In argumentation, people state premises to reason towards a conclusion. The conclusion conveys a stance towards some target, such as a concept or statement. Often, the conclusion remains implicit, though, since it is self-evident in a discussion or left out for rhetorical reasons. However, the conclusion is key to understanding an argument and, hence, to any application that processes argumentation. We thus study the question to what extent an argument’s conclusion can be reconstructed from its premises. In particular, we argue here that a decisive step is to infer a conclusion’s target, and we hypothesize that this target is related to the premises’ targets. We develop two complementary target inference approaches: one ranks premise targets and selects the top-ranked target as the conclusion target, the other finds a new conclusion target in a learned embedding space using a triplet neural network. Our evaluation on corpora from two domains indicates that a hybrid of both approaches is best, outperforming several strong baselines. According to human annotators, we infer a reasonably adequate conclusion target in 89% of the cases.

pdf bib
Persuasiveness of News Editorials depending on Ideology and Personality
Roxanne El Baff | Khalid Al Khatib | Benno Stein | Henning Wachsmuth
Proceedings of the Third Workshop on Computational Modeling of People's Opinions, Personality, and Emotion's in Social Media

News editorials aim to shape the opinions of their readership and the general public on timely controversial issues. The impact of an editorial on the reader’s opinion does not only depend on its content and style, but also on the reader’s profile. Previous work has studied the effect of editorial style depending on general political ideologies (liberals vs.conservatives). In our work, we dig deeper into the persuasiveness of both content and style, exploring the role of the intensity of an ideology (lean vs.extreme) and the reader’s personality traits (agreeableness, conscientiousness, extraversion, neuroticism, and openness). Concretely, we train content- and style-based models on New York Times editorials for different ideology- and personality-specific groups. Our results suggest that particularly readers with extreme ideology and non “role model” personalities are impacted by style. We further analyze the importance of various text features with respect to the editorials’ impact, the readers’ profile, and the editorials’ geographical scope.

pdf bib
SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles
Giovanni Da San Martino | Alberto Barrón-Cedeño | Henning Wachsmuth | Rostislav Petrov | Preslav Nakov
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We present the results and the main findings of SemEval-2020 Task 11 on Detection of Propaganda Techniques in News Articles. The task featured two subtasks. Subtask SI is about Span Identification: given a plain-text document, spot the specific text fragments containing propaganda. Subtask TC is about Technique Classification: given a specific text fragment, in the context of a full document, determine the propaganda technique it uses, choosing from an inventory of 14 possible propaganda techniques. The task attracted a large number of participants: 250 teams signed up to participate and 44 made a submission on the test set. In this paper, we present the task, analyze the results, and discuss the system submissions and the methods they used. For both subtasks, the best systems used pre-trained Transformers and ensembles.

pdf bib
Analyzing Political Bias and Unfairness in News Articles at Different Levels of Granularity
Wei-Fan Chen | Khalid Al Khatib | Henning Wachsmuth | Benno Stein
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science

Media is an indispensable source of information and opinion, shaping the beliefs and attitudes of our society. Obviously, media portals can also provide overly biased content, e.g., by reporting on political events in a selective or incomplete manner. A relevant question hence is whether and how such a form of unfair news coverage can be exposed. This paper addresses the automatic detection of bias, but it goes one step further in that it explores how political bias and unfairness are manifested linguistically. We utilize a new corpus of 6964 news articles with labels derived from adfontesmedia.com to develop a neural model for bias assessment. Analyzing the model on article excerpts, we find insightful bias patterns at different levels of text granularity, from single words to the whole article discourse.

2019

pdf bib
Modeling Frames in Argumentation
Yamen Ajjour | Milad Alshomary | Henning Wachsmuth | Benno Stein
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In argumentation, framing is used to emphasize a specific aspect of a controversial topic while concealing others. When talking about legalizing drugs, for instance, its economical aspect may be emphasized. In general, we call a set of arguments that focus on the same aspect a frame. An argumentative text has to serve the “right” frame(s) to convince the audience to adopt the author’s stance (e.g., being pro or con legalizing drugs). More specifically, an author has to choose frames that fit the audience’s cultural background and interests. This paper introduces frame identification, which is the task of splitting a set of arguments into non-overlapping frames. We present a fully unsupervised approach to this task, which first removes topical information and then identifies frames using clustering. For evaluation purposes, we provide a corpus with 12, 326 debate-portal arguments, organized along the frames of the debates’ topics. On this corpus, our approach outperforms different strong baselines, achieving an F1-score of 0.28.

pdf bib
Unraveling the Search Space of Abusive Language in Wikipedia with Dynamic Lexicon Acquisition
Wei-Fan Chen | Khalid Al Khatib | Matthias Hagen | Henning Wachsmuth | Benno Stein
Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda

Many discussions on online platforms suffer from users offending others by using abusive terminology, threatening each other, or being sarcastic. Since an automatic detection of abusive language can support human moderators of online discussion platforms, detecting abusiveness has recently received increased attention. However, the existing approaches simply train one classifier for the whole variety of abusiveness. In contrast, our approach is to distinguish explicitly abusive cases from the more “shadowed” ones. By dynamically extending a lexicon of abusive terms (e.g., including new obfuscations of abusive terms), our approach can support a moderator with explicit unraveled explanations for why something was flagged as abusive: due to known explicitly abusive terms, due to newly detected (obfuscated) terms, or due to shadowed cases.

pdf bib
Proceedings of the 6th Workshop on Argument Mining
Benno Stein | Henning Wachsmuth
Proceedings of the 6th Workshop on Argument Mining

pdf bib
Computational Argumentation Synthesis as a Language Modeling Task
Roxanne El Baff | Henning Wachsmuth | Khalid Al Khatib | Manfred Stede | Benno Stein
Proceedings of the 12th International Conference on Natural Language Generation

Synthesis approaches in computational argumentation so far are restricted to generating claim-like argument units or short summaries of debates. Ultimately, however, we expect computers to generate whole new arguments for a given stance towards some topic, backing up claims following argumentative and rhetorical considerations. In this paper, we approach such an argumentation synthesis as a language modeling task. In our language model, argumentative discourse units are the “words”, and arguments represent the “sentences”. Given a pool of units for any unseen topic-stance pair, the model selects a set of unit types according to a basic rhetorical strategy (logos vs. pathos), arranges the structure of the types based on the units’ argumentative roles, and finally “phrases” an argument by instantiating the structure with semantically coherent units from the pool. Our evaluation suggests that the model can, to some extent, mimic the human synthesis of strategy-specific arguments.

2018

pdf bib
Argumentation Synthesis following Rhetorical Strategies
Henning Wachsmuth | Manfred Stede | Roxanne El Baff | Khalid Al-Khatib | Maria Skeppstedt | Benno Stein
Proceedings of the 27th International Conference on Computational Linguistics

Persuasion is rarely achieved through a loose set of arguments alone. Rather, an effective delivery of arguments follows a rhetorical strategy, combining logical reasoning with appeals to ethics and emotion. We argue that such a strategy means to select, arrange, and phrase a set of argumentative discourse units. In this paper, we model rhetorical strategies for the computational synthesis of effective argumentation. In a study, we let 26 experts synthesize argumentative texts with different strategies for 10 topics. We find that the experts agree in the selection significantly more when following the same strategy. While the texts notably vary for different strategies, especially their arrangement remains stable. The results suggest that our model enables a strategical synthesis.

pdf bib
Before Name-Calling: Dynamics and Triggers of Ad Hominem Fallacies in Web Argumentation
Ivan Habernal | Henning Wachsmuth | Iryna Gurevych | Benno Stein
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Arguing without committing a fallacy is one of the main requirements of an ideal debate. But even when debating rules are strictly enforced and fallacious arguments punished, arguers often lapse into attacking the opponent by an ad hominem argument. As existing research lacks solid empirical investigation of the typology of ad hominem arguments as well as their potential causes, this paper fills this gap by (1) performing several large-scale annotation studies, (2) experimenting with various neural architectures and validating our working hypotheses, such as controversy or reasonableness, and (3) providing linguistic insights into triggers of ad hominem using explainable neural network architectures.

pdf bib
The Argument Reasoning Comprehension Task: Identification and Reconstruction of Implicit Warrants
Ivan Habernal | Henning Wachsmuth | Iryna Gurevych | Benno Stein
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Reasoning is a crucial part of natural language argumentation. To comprehend an argument, one must analyze its warrant, which explains why its claim follows from its premises. As arguments are highly contextualized, warrants are usually presupposed and left implicit. Thus, the comprehension does not only require language understanding and logic skills, but also depends on common sense. In this paper we develop a methodology for reconstructing warrants systematically. We operationalize it in a scalable crowdsourcing process, resulting in a freely licensed dataset with warrants for 2k authentic arguments from news comments. On this basis, we present a new challenging task, the argument reasoning comprehension task. Given an argument with a claim and a premise, the goal is to choose the correct implicit warrant from two options. Both warrants are plausible and lexically close, but lead to contradicting claims. A solution to this task will define a substantial step towards automatic warrant reconstruction. However, experiments with several neural attention and language models reveal that current approaches do not suffice.

pdf bib
Challenge or Empower: Revisiting Argumentation Quality in a News Editorial Corpus
Roxanne El Baff | Henning Wachsmuth | Khalid Al-Khatib | Benno Stein
Proceedings of the 22nd Conference on Computational Natural Language Learning

News editorials are said to shape public opinion, which makes them a powerful tool and an important source of political argumentation. However, rarely do editorials change anyone’s stance on an issue completely, nor do they tend to argue explicitly (but rather follow a subtle rhetorical strategy). So, what does argumentation quality mean for editorials then? We develop the notion that an effective editorial challenges readers with opposing stance, and at the same time empowers the arguing skills of readers that share the editorial’s stance — or even challenges both sides. To study argumentation quality based on this notion, we introduce a new corpus with 1000 editorials from the New York Times, annotated for their perceived effect along with the annotators’ political orientations. Analyzing the corpus, we find that annotators with different orientation disagree on the effect significantly. While only 1% of all editorials changed anyone’s stance, more than 5% meet our notion. We conclude that our corpus serves as a suitable resource for studying the argumentation quality of news editorials.

pdf bib
Retrieval of the Best Counterargument without Prior Topic Knowledge
Henning Wachsmuth | Shahbaz Syed | Benno Stein
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Given any argument on any controversial topic, how to counter it? This question implies the challenging retrieval task of finding the best counterargument. Since prior knowledge of a topic cannot be expected in general, we hypothesize the best counterargument to invoke the same aspects as the argument while having the opposite stance. To operationalize our hypothesis, we simultaneously model the similarity and dissimilarity of pairs of arguments, based on the words and embeddings of the arguments’ premises and conclusions. A salient property of our model is its independence from the topic at hand, i.e., it applies to arbitrary arguments. We evaluate different model variations on millions of argument pairs derived from the web portal idebate.org. Systematic ranking experiments suggest that our hypothesis is true for many arguments: For 7.6 candidates with opposing stance on average, we rank the best counterargument highest with 60% accuracy. Even among all 2801 test set pairs as candidates, we still find the best one about every third time.

pdf bib
Modeling Deliberative Argumentation Strategies on Wikipedia
Khalid Al-Khatib | Henning Wachsmuth | Kevin Lang | Jakob Herpel | Matthias Hagen | Benno Stein
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper studies how the argumentation strategies of participants in deliberative discussions can be supported computationally. Our ultimate goal is to predict the best next deliberative move of each participant. In this paper, we present a model for deliberative discussions and we illustrate its operationalization. Previous models have been built manually based on a small set of discussions, resulting in a level of abstraction that is not suitable for move recommendation. In contrast, we derive our model statistically from several types of metadata that can be used for move description. Applied to six million discussions from Wikipedia talk pages, our approach results in a model with 13 categories along three dimensions: discourse acts, argumentative relations, and frames. On this basis, we automatically generate a corpus with about 200,000 turns, labeled for the 13 categories. We then operationalize the model with three supervised classifiers and provide evidence that the proposed categories can be predicted.

pdf bib
Learning to Flip the Bias of News Headlines
Wei-Fan Chen | Henning Wachsmuth | Khalid Al-Khatib | Benno Stein
Proceedings of the 11th International Conference on Natural Language Generation

This paper introduces the task of “flipping” the bias of news articles: Given an article with a political bias (left or right), generate an article with the same topic but opposite bias. To study this task, we create a corpus with bias-labeled articles from all-sides.com. As a first step, we analyze the corpus and discuss intrinsic characteristics of bias. They point to the main challenges of bias flipping, which in turn lead to a specific setting in the generation process. The paper in hand narrows down the general bias flipping task to focus on bias flipping for news article headlines. A manual annotation of headlines from each side reveals that they are self-informative in general and often convey bias. We apply an autoencoder incorporating information from an article’s content to learn how to automatically flip the bias. From 200 generated headlines, 73 are classified as understandable by annotators, and 83 maintain the topic while having opposite bias. Insights from our analysis shed light on how to solve the main challenges of bias flipping.

pdf bib
Visualization of the Topic Space of Argument Search Results in args.me
Yamen Ajjour | Henning Wachsmuth | Dora Kiesel | Patrick Riehmann | Fan Fan | Giuliano Castiglia | Rosemary Adejoh | Bernd Fröhlich | Benno Stein
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

In times of fake news and alternative facts, pro and con arguments on controversial topics are of increasing importance. Recently, we presented args.me as the first search engine for arguments on the web. In its initial version, args.me ranked arguments solely by their relevance to a topic queried for, making it hard to learn about the diverse topical aspects covered by the search results. To tackle this shortcoming, we integrated a visualization interface for result exploration in args.me that provides an instant overview of the main aspects in a barycentric coordinate system. This topic space is generated ad-hoc from controversial issues on Wikipedia and argument-specific LDA models. In two case studies, we demonstrate how individual arguments can be found easily through interactions with the visualization, such as highlighting and filtering.

pdf bib
SemEval-2018 Task 12: The Argument Reasoning Comprehension Task
Ivan Habernal | Henning Wachsmuth | Iryna Gurevych | Benno Stein
Proceedings of The 12th International Workshop on Semantic Evaluation

A natural language argument is composed of a claim as well as reasons given as premises for the claim. The warrant explaining the reasoning is usually left implicit, as it is clear from the context and common sense. This makes a comprehension of arguments easy for humans but hard for machines. This paper summarizes the first shared task on argument reasoning comprehension. Given a premise and a claim along with some topic information, the goal was to automatically identify the correct warrant among two candidates that are plausible and lexically close, but in fact imply opposite claims. We describe the dataset with 1970 instances that we built for the task, and we outline the 21 computational approaches that participated, most of which used neural networks. The results reveal the complexity of the task, with many approaches hardly improving over the random accuracy of about 0.5. Still, the best observed accuracy (0.712) underlines the principle feasibility of identifying warrants. Our analysis indicates that an inclusion of external knowledge is key to reasoning comprehension.

2017

pdf bib
Building an Argument Search Engine for the Web
Henning Wachsmuth | Martin Potthast | Khalid Al-Khatib | Yamen Ajjour | Jana Puschmann | Jiani Qu | Jonas Dorsch | Viorel Morari | Janek Bevendorff | Benno Stein
Proceedings of the 4th Workshop on Argument Mining

Computational argumentation is expected to play a critical role in the future of web search. To make this happen, many search-related questions must be revisited, such as how people query for arguments, how to mine arguments from the web, or how to rank them. In this paper, we develop an argument search framework for studying these and further questions. The framework allows for the composition of approaches to acquiring, mining, assessing, indexing, querying, retrieving, ranking, and presenting arguments while relying on standard infrastructure and interfaces. Based on the framework, we build a prototype search engine, called args, that relies on an initial, freely accessible index of nearly 300k arguments crawled from reliable web resources. The framework and the argument search engine are intended as an environment for collaborative research on computational argumentation and its practical evaluation.

pdf bib
Unit Segmentation of Argumentative Texts
Yamen Ajjour | Wei-Fan Chen | Johannes Kiesel | Henning Wachsmuth | Benno Stein
Proceedings of the 4th Workshop on Argument Mining

The segmentation of an argumentative text into argument units and their non-argumentative counterparts is the first step in identifying the argumentative structure of the text. Despite its importance for argument mining, unit segmentation has been approached only sporadically so far. This paper studies the major parameters of unit segmentation systematically. We explore the effectiveness of various features, when capturing words separately, along with their neighbors, or even along with the entire text. Each such context is reflected by one machine learning model that we evaluate within and across three domains of texts. Among the models, our new deep learning approach capturing the entire text turns out best within all domains, with an F-score of up to 88.54. While structural features generalize best across domains, the domain transfer remains hard, which points to major challenges of unit segmentation.

pdf bib
Argumentation Quality Assessment: Theory vs. Practice
Henning Wachsmuth | Nona Naderi | Ivan Habernal | Yufang Hou | Graeme Hirst | Iryna Gurevych | Benno Stein
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Argumentation quality is viewed differently in argumentation theory and in practical assessment approaches. This paper studies to what extent the views match empirically. We find that most observations on quality phrased spontaneously are in fact adequately represented by theory. Even more, relative comparisons of arguments in practice correlate with absolute quality ratings based on theory. Our results clarify how the two views can learn from each other.

pdf bib
Patterns of Argumentation Strategies across Topics
Khalid Al-Khatib | Henning Wachsmuth | Matthias Hagen | Benno Stein
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

This paper presents an analysis of argumentation strategies in news editorials within and across topics. Given nearly 29,000 argumentative editorials from the New York Times, we develop two machine learning models, one for determining an editorial’s topic, and one for identifying evidence types in the editorial. Based on the distribution and structure of the identified types, we analyze the usage patterns of argumentation strategies among 12 different topics. We detect several common patterns that provide insights into the manifestation of argumentation strategies. Also, our experiments reveal clear correlations between the topics and the detected patterns.

pdf bib
The Impact of Modeling Overall Argumentation with Tree Kernels
Henning Wachsmuth | Giovanni Da San Martino | Dora Kiesel | Benno Stein
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Several approaches have been proposed to model either the explicit sequential structure of an argumentative text or its implicit hierarchical structure. So far, the adequacy of these models of overall argumentation remains unclear. This paper asks what type of structure is actually important to tackle downstream tasks in computational argumentation. We analyze patterns in the overall argumentation of texts from three corpora. Then, we adapt the idea of positional tree kernels in order to capture sequential and hierarchical argumentative structure together for the first time. In systematic experiments for three text classification tasks, we find strong evidence for the impact of both types of structure. Our results suggest that either of them is necessary while their combination may be beneficial.

pdf bib
Computational Argumentation Quality Assessment in Natural Language
Henning Wachsmuth | Nona Naderi | Yufang Hou | Yonatan Bilu | Vinodkumar Prabhakaran | Tim Alberdingk Thijm | Graeme Hirst | Benno Stein
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Research on computational argumentation faces the problem of how to automatically assess the quality of an argument or argumentation. While different quality dimensions have been approached in natural language processing, a common understanding of argumentation quality is still missing. This paper presents the first holistic work on computational argumentation quality in natural language. We comprehensively survey the diverse existing theories and approaches to assess logical, rhetorical, and dialectical quality dimensions, and we derive a systematic taxonomy from these. In addition, we provide a corpus with 320 arguments, annotated for all 15 dimensions in the taxonomy. Our results establish a common ground for research on computational argumentation quality assessment.

pdf bib
PageRank” for Argument Relevance
Henning Wachsmuth | Benno Stein | Yamen Ajjour
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Future search engines are expected to deliver pro and con arguments in response to queries on controversial topics. While argument mining is now in the focus of research, the question of how to retrieve the relevant arguments remains open. This paper proposes a radical model to assess relevance objectively at web scale: the relevance of an argument’s conclusion is decided by what other arguments reuse it as a premise. We build an argument graph for this model that we analyze with a recursive weighting scheme, adapting key ideas of PageRank. In experiments on a large ground-truth argument graph, the resulting relevance scores correlate with human average judgments. We outline what natural language challenges must be faced at web scale in order to stepwise bring argument relevance to web search engines.

pdf bib
WAT-SL: A Customizable Web Annotation Tool for Segment Labeling
Johannes Kiesel | Henning Wachsmuth | Khalid Al-Khatib | Benno Stein
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

A frequent type of annotations in text corpora are labeled text segments. General-purpose annotation tools tend to be overly comprehensive, often making the annotation process slower and more error-prone. We present WAT-SL, a new web-based tool that is dedicated to segment labeling and highly customizable to the labeling task at hand. We outline its main features and exemplify how we used it for a crowdsourced corpus with labeled argument units.

2016

pdf bib
Cross-Domain Mining of Argumentative Text through Distant Supervision
Khalid Al-Khatib | Henning Wachsmuth | Matthias Hagen | Jonas Köhler | Benno Stein
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Using Argument Mining to Assess the Argumentation Quality of Essays
Henning Wachsmuth | Khalid Al-Khatib | Benno Stein
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Argument mining aims to determine the argumentative structure of texts. Although it is said to be crucial for future applications such as writing support systems, the benefit of its output has rarely been evaluated. This paper puts the analysis of the output into the focus. In particular, we investigate to what extent the mined structure can be leveraged to assess the argumentation quality of persuasive essays. We find insightful statistical patterns in the structure of essays. From these, we derive novel features that we evaluate in four argumentation-related essay scoring tasks. Our results reveal the benefit of argument mining for assessing argumentation quality. Among others, we improve the state of the art in scoring an essay’s organization and its argument strength.

pdf bib
A News Editorial Corpus for Mining Argumentation Strategies
Khalid Al-Khatib | Henning Wachsmuth | Johannes Kiesel | Matthias Hagen | Benno Stein
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Many argumentative texts, and news editorials in particular, follow a specific strategy to persuade their readers of some opinion or attitude. This includes decisions such as when to tell an anecdote or where to support an assumption with statistics, which is reflected by the composition of different types of argumentative discourse units in a text. While several argument mining corpora have recently been published, they do not allow the study of argumentation strategies due to incomplete or coarse-grained unit annotations. This paper presents a novel corpus with 300 editorials from three diverse news portals that provides the basis for mining argumentation strategies. Each unit in all editorials has been assigned one of six types by three annotators with a high Fleiss’ Kappa agreement of 0.56. We investigate various challenges of the annotation process and we conduct a first corpus analysis. Our results reveal different strategies across the news portals, exemplifying the benefit of studying editorials—a so far underresourced text genre in argument mining.

2015

pdf bib
Sentiment Flow - A General Model of Web Review Argumentation
Henning Wachsmuth | Johannes Kiesel | Benno Stein
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
Modeling Review Argumentation for Robust Sentiment Analysis
Henning Wachsmuth | Martin Trenkmann | Benno Stein | Gregor Engels
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
Learning Efficient Information Extraction on Heterogeneous Texts
Henning Wachsmuth | Benno Stein | Gregor Engels
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib
Optimal Scheduling of Information Extraction Algorithms
Henning Wachsmuth | Benno Stein
Proceedings of COLING 2012: Posters

2011

pdf bib
Back to the Roots of Genres: Text Classification by Language Function
Henning Wachsmuth | Kathrin Bujna
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib
Efficient Statement Identification for Automatic Market Forecasting
Henning Wachsmuth | Peter Prettenhofer | Benno Stein
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)