Many computational argumentation tasks, such as stance classification, are topic-dependent: The effectiveness of approaches to these tasks depends largely on whether they are trained with arguments on the same topics as those on which they are tested. The key question is: What are these training topics? To answer this question, we take the first step of mapping the argumentation landscape with The Argument Ontology (TAO). TAO draws on three authoritative sources for argument topics: the World Economic Forum, Wikipedia’s list of controversial topics, and Debatepedia. By comparing the topics in our ontology with those in 59 argument corpora, we perform the first comprehensive assessment of their topic coverage. While TAO already covers most of the corpus topics, the corpus topics barely cover all the topics in TAO. This points to a new goal for corpus construction to achieve a broad topic coverage and thus better generalizability of computational argumentation approaches.
We contribute to the ArgMining 2021 shared task on Quantitative Summarization and Key Point Analysis with two approaches for argument key point matching. For key point matching the task is to decide if a short key point matches the content of an argument with the same topic and stance towards the topic. We approach this task in two ways: First, we develop a simple rule-based baseline matcher by computing token overlap after removing stop words, stemming, and adding synonyms/antonyms. Second, we fine-tune pretrained BERT and RoBERTalanguage models as aregression classifier for only a single epoch. We manually examine errors of our proposed matcher models and find that long arguments are harder to classify. Our fine-tuned RoBERTa-Base model achieves a mean average precision score of 0.913, the best score for strict labels of all participating teams.
In argumentation, framing is used to emphasize a specific aspect of a controversial topic while concealing others. When talking about legalizing drugs, for instance, its economical aspect may be emphasized. In general, we call a set of arguments that focus on the same aspect a frame. An argumentative text has to serve the “right” frame(s) to convince the audience to adopt the author’s stance (e.g., being pro or con legalizing drugs). More specifically, an author has to choose frames that fit the audience’s cultural background and interests. This paper introduces frame identification, which is the task of splitting a set of arguments into non-overlapping frames. We present a fully unsupervised approach to this task, which first removes topical information and then identifies frames using clustering. For evaluation purposes, we provide a corpus with 12, 326 debate-portal arguments, organized along the frames of the debates’ topics. On this corpus, our approach outperforms different strong baselines, achieving an F1-score of 0.28.
In times of fake news and alternative facts, pro and con arguments on controversial topics are of increasing importance. Recently, we presented args.me as the first search engine for arguments on the web. In its initial version, args.me ranked arguments solely by their relevance to a topic queried for, making it hard to learn about the diverse topical aspects covered by the search results. To tackle this shortcoming, we integrated a visualization interface for result exploration in args.me that provides an instant overview of the main aspects in a barycentric coordinate system. This topic space is generated ad-hoc from controversial issues on Wikipedia and argument-specific LDA models. In two case studies, we demonstrate how individual arguments can be found easily through interactions with the visualization, such as highlighting and filtering.
Future search engines are expected to deliver pro and con arguments in response to queries on controversial topics. While argument mining is now in the focus of research, the question of how to retrieve the relevant arguments remains open. This paper proposes a radical model to assess relevance objectively at web scale: the relevance of an argument’s conclusion is decided by what other arguments reuse it as a premise. We build an argument graph for this model that we analyze with a recursive weighting scheme, adapting key ideas of PageRank. In experiments on a large ground-truth argument graph, the resulting relevance scores correlate with human average judgments. We outline what natural language challenges must be faced at web scale in order to stepwise bring argument relevance to web search engines.
Computational argumentation is expected to play a critical role in the future of web search. To make this happen, many search-related questions must be revisited, such as how people query for arguments, how to mine arguments from the web, or how to rank them. In this paper, we develop an argument search framework for studying these and further questions. The framework allows for the composition of approaches to acquiring, mining, assessing, indexing, querying, retrieving, ranking, and presenting arguments while relying on standard infrastructure and interfaces. Based on the framework, we build a prototype search engine, called args, that relies on an initial, freely accessible index of nearly 300k arguments crawled from reliable web resources. The framework and the argument search engine are intended as an environment for collaborative research on computational argumentation and its practical evaluation.
The segmentation of an argumentative text into argument units and their non-argumentative counterparts is the first step in identifying the argumentative structure of the text. Despite its importance for argument mining, unit segmentation has been approached only sporadically so far. This paper studies the major parameters of unit segmentation systematically. We explore the effectiveness of various features, when capturing words separately, along with their neighbors, or even along with the entire text. Each such context is reflected by one machine learning model that we evaluate within and across three domains of texts. Among the models, our new deep learning approach capturing the entire text turns out best within all domains, with an F-score of up to 88.54. While structural features generalize best across domains, the domain transfer remains hard, which points to major challenges of unit segmentation.