Topic Ontologies for Arguments

Many computational argumentation tasks, such as stance classification, are topic-dependent: The effectiveness of approaches to these tasks depends largely on whether they are trained with arguments on the same topics as those on which they are tested. The key question is: What are these training topics? To answer this question, we take the first step of mapping the argumentation landscape with The Argument Ontology (TAO). TAO draws on three authoritative sources for argument topics: the World Economic Forum, Wikipedia’s list of controversial topics, and Debatepedia. By comparing the topics in our ontology with those in 59 argument corpora, we perform the first comprehensive assessment of their topic coverage. While TAO already covers most of the corpus topics, the corpus topics barely cover all the topics in TAO. This points to a new goal for corpus construction to achieve a broad topic coverage and thus better generalizability of computational argumentation approaches.


Introduction
The term "topic" refers to the subject matter of a text.A text may be about one or more topics and the relationship between topics and texts is called "aboutness" (Yablo, 2014).Topics play a central role in argumentation because they determine argumentation strategies and rhetorical devices by setting the appropriate and expected universe of discourse.This view is supported by pragmadialectics (van Eemeren, 2015): "The basic aspects of strategic maneuvering [. . . ] are making an expedient selection from the 'topical potential' available at a certain discussion."Although debaters often use commonplace arguments across topics (Bilu et al., 2019), they must be relevant: a black market argument, for example, can be equally well applied to topics such as banning drugs or banning firearms.As recently shown, for example, by Reuver et al. (2021), training computational models to extract, analyze, or generate arguments with a broad topic coverage improves their generalizability.
A set of topics can be organized as a graph, sometimes called a "topic space".Information theorists and library scientists map hierarchical subject relationships into ontologies in this way (Hjørland, 2001).For this purpose, topics are labeled with a subject heading, a phrase from a controlled vocabulary that describes a topic in a concise and discriminating manner.While library ontologies are not focused on argumentation, others deal specifically with argumentative topic spaces.We have identified and tapped three authoritative sources of ontological knowledge covering global issues, controversies, and popular debates: the World Economic Forum's "Strategic Intelligence" site, Wikipedia's list of controversial topics, and Debatepedia's debate classification system (Section 4).They form the basis for The Argument Ontology (TAO). 1e compile a comprehensive survey of 59 argument corpora ( Section 3) and investigate their topic coverage with respect to the three authoritative ontologies (Section 5).The coverage of corpora with topic labels is manually assessed by matching each label with the topics of the ontologies.From this, the ontology topics covered by a corpus and the distribution of corpus arguments in the ontologies are calculated.Our analyses show that the existing corpora focus on only a subset of the known topics.For corpora without topic labels, we categorize their argumentative texts by measuring their semantic relatedness to ontology topics.Given the large number of ontology topics (748 for Wikipedia), this is a challenging classification for which we achieve a remarkable F 1 of 0.59.(Section 6). 2ltogether, we lay the foundation for the study and systematic exploration of controversial topics within computational argumentation analysis.The authoritative sources identified already cover their respective areas quite comprehensively.Future work will need to extend our approach to other subject areas, such as business, domestic, historical, and scientific argument spaces.

Related Work
Our review of related work focuses on the role of the variable "topic" in computational argumentation.Moreover, we briefly review topic ontologies and hierarchical topic classification.

Topics in Computational Argumentation
In computational argumentation, arguments are typically modeled as compositions of argument units, where an argument unit is represented as a span of text.Habernal and Gurevych (2016a) adopts Toulmin (1958Toulmin ( )'s (1958) ) model, which defines six unit types, among which are "claim" and "data".Wachsmuth et al. (2017) employ a more basic model of two units, which defines an argument as a claim or conclusion supported by one or more premises.These models capture arguments without explicitly identifying the topic they address.Levy et al. (2014) consider claims to be topic-dependent and study their detection in the context of a random selection of 32 topics from idebate.org.This work raises the question why topic-dependence has not been addressed more urgently until now.
Key tasks for computational argumentation include the mining of arguments from natural language (Moens et al., 2007;Al-Khatib et al., 2016), classifying their stances with regard to a thesis (Bar-Haim et al., 2017), and analyzing which arguments are more persuasive (Tan et al., 2016;Habernal and Gurevych, 2016a).Current approaches to these tasks rely on supervised classification.Daxenberger et al. (2017) show that supervised classifiers fail to generalize across domains (∼ topics).More recently, Stab et al. (2018) tweak BiLSTM (Graves and Schmidhuberab, 2005) to integrate the topic while jointly detecting (1) whether a sentence is an argument and (2) its stance to the topic.The designed neural network outperforms BiL-STM without topic integration in both tasks; further evidence for the topic-dependence of argument mining and stance classification.Whether model transfer between more closely related topics works better is unknown.As a first step, Reuver et al. (2021) show that cross-topic stance-classification with BERT (Devlin et al., 2018) produces mixed results depending on the topics, but misses the relations between the topics.Gu et al. (2018) show that integrating the topic of an argument helps assessing its persuasiveness.
Topic plays a central role in argument retrieval and generation since it defines what arguments are relevant.Argument retrieval aims at delivering pro and con arguments on a given topic query.A major challenge in argument retrieval is the grouping of arguments that address common aspects of a topic.As shown by Reimers et al. (2019) and Ajjour et al. (2019a), integrating the topic is an important step while clustering arguments.For argument generation, Bilu et al. (2019) introduce an approach that matches an input topic against a list of topics that are paired with sets of topic-adjustable commonplace arguments (e.g., black-market arguments).In a similar vein, Bar-Haim et al. (2019) identify consistent and contrastive topics for a given topic with the goal of expanding the topic in a new direction (e.g., fast food versus obesity).Both approaches show the merit of utilizing argument topic ontologies in argument generation.
Only abstract argumentation may be truly topicindependent, where only the structure and relations among arguments, not their language, are studied.

Topic Ontologies
In information science, an ontology is defined as "an explicit specification of a conceptualization" (Gruber, 1993).Topic ontologies are a specific type of ontologies which specify topics as nodes of a directed acyclic graph.An edge in the graph then implies an "is part of"-relation between the topics (Xamena et al., 2017).The effort in creating topic ontologies ranges from ad-hoc decisions (e.g., tags for blog posts) to extensive classification schemes for libraries.The oldest classification scheme that is still used today in libraries is the Dewey Decimal Classification.It has been translated into over 30 languages, and it contains several tens of thousands of classes.Most topic ontologies focus on a specific domain, such as a the ACM Computing Classification System for computer science, or DMOZ for web pages. 3The only topic ontology directly linked to arguments is that of Debatepedia.

Hierarchical Text Classification
Hierarchical text classification aims at classifying a document into a class hierarchy.Depending on how the hierarchical structure is exploited, classification can be done top-down (from higher classes downwards), bottom-up, or flat (ignoring hierarchical relations) (Silla and Freitas, 2011).Researchers usually train supervised classifiers for each class in the hierarchy (Sun and Lim, 2001). .The unit granularity is the one in the corpus' files, using premises and conclusions as one unit each and the best context-preserving unit for corpora featuring multiple granularities.We presume these topic selection directives from the corpus description: either manual selection by the authors, or source-driven-i.e., the topics in the selected source(s)-from the units of a specific time-span or by random sampling.Experiments (Exp.)denotes the count of papers that use the corpus in an experiment among those papers that cite the corpus' paper.

Survey of Argument Corpora
To study arguments and computational argumentation tasks, researchers compile corpora with argumentative texts.To the best of our knowledge, Table 1 compiles all corpora dedicated to argumentation until 2022.We review these corpora and their associated publications with regard to what are the sources of arguments, what is the granularity of the corpus, what is the size of the corpora in terms of their units, and which and how many different topics are covered in them.Reviewing all papers citing a corpus, we also analyzed how many experiments were carried out using them.
The most elaborate discussion of topic selection is given in Habernal and Gurevych (2016a), who chose six topics (homeschooling, public versus private schools, redshirting, prayers in schools, single sex education, mainstreaming) to focus on different education-related aspects.The broadest selection of topics is reported by the researchers of IBM Debater, 4 who obtain arguments from Wikipedia.However, samples of the topics have been used in their papers without mentioning which ones.The only other work mentioning their source of topics stems from Stab et al. (2018), who randomly select 8 topics from two lists of controversial topics that originate from an online library and the debate portal ProCon.org,respectively.Peldszus and Stede (2015) predefine a set of topics and give writers the freedom to choose which one to write about, but nothing is said about where the set of predefined topics originate from.Conard et al. (2012) and Hasan and Ng (2014) explicitly select one and 4 https://www.research.ibm.com/haifa/dept/vst/debating_data.shtmlfour topics, respectively.For all other corpora with topic labels, their authors do not argue on choosing topics, nor selection or sampling criteria.Neither do the authors of corpora without topic labels.
Altogether, it appears that the best practices in argumentation do not as of yet consider topic sampling as a prerequisite task to ensure coverage of a certain domain of interest, diversity, or reproducibility.Based on our review, we presume three basic topic selection directives are in use today: (1) Manual selection.Topics are manually defined or selected.Although the process may be random, when aiming for controversial topics, one may often end up with commonplace topics in Western culture (e.g., abortion, death penalty, gay marriage).Still, they are relevant and important today.
(2) Source-driven (greedy within a time-span).A source of argument ground truth is either exploited in its entirety, or a maximum subset fulfilling desired properties is used.Since argument-related ground truth is hard to come by, it is understandable that many readily available sources are being exploited.(3) Source-driven (sampled).A source or argument ground truth is exploited and a subset is sampled.Here, it may be infeasible to exploit a source in its entirety.Al-Khatib et al. (2016b) randomly select 300 documents from three websites.Park and Cardie (2018) and Stab and Gurevych (2017) do not mention anything about their sampling process.In general, both source-driven corpus construction approaches inevitably incur the source's idiosyncracies of topic selection in terms of skew towards certain topics.Scaling up may or may not be a remedy for this problem.Figure 1: Example for an assignment of arguments (bottom) to topics of a two-leveled ontology.Level 2 topics are subtopics of their linked Level 1 topics.Arguments linked to a Level 2 topic also pertain to its Level 1 ancestors.
We assess how many experiments have been reported on each of the corpora by collecting the publications referring to a corpus as per Google Scholar, focusing on conference and journal papers, but excluding books and web pages.We then check whether the cited corpus is mentioned in its data, experiment, or results section.As can be seen in Table 1, corpora with fewer topics tend to be used more often in experiments than those with larger amounts.In total, 230 experiments were carried out on argument corpora with no clearly defined topic selection directive.The skew towards smaller-scale experiments may affect generalizability.

Bootstrapping The Argument Ontology
Topic ontologies provide for a knowledge organization principle, and, especially if widely accepted, also a standard.They are typically modeled as directed acyclic graphs, where nodes correspond to topics and edges indicate "is part of" relations.Topics that are part of other topics are called their subtopics.A topic ontology is often displayed in levels, starting with the topics that are not subtopics of others, continuing recursively with each lower level of subtopics.Figure 1 shows an excerpt of a two-level topic ontology for arguments.
The identification of the topics to be included in The Argument Ontology (TAO), as well as their relations, requires domain expertise.Building an all-encompassing ontology thus requires experts from every top-level domain where argumentation of scientific interest is expected.In the following, we suggest and outline three authoritative sources of expert topic ontologies, which comprise a wide selection of important argumentative topics.We use them to bootstrap a first version of TAO.
World Economic Forum (WEF) The World Economic Forum is a not-for-profit foundation that coordinates organizations from both the public and the private sector to work on economical and societal issues.As part of their efforts, their "Strategic Intelligence" platform5 strives to inform decision makers on domestic and global topics, specifically global issues (e.g., artificial intelligence and climate change), industries (e.g., healthcare delivery and private investors), and economies (e.g., Africa and ASEAN).Domain experts for each topic curate a stream of relevant news articles which they each tag with 4-9 subtopics of their topic (e.g., the continuous monitoring of mental health).Wikipedia Wikipedia strives for a neutral point of view, but many topics of public interest are discussed controversially.Some editors thus curate a list of controversial Wikipedia articles to highlight where special care is needed, grouped into 14 toplevel topics (e.g., environment and philosophy) and 4-176 subtopics (e.g., creationism and pollution). 6e omit the "People" topic and articles on countries; their controversiality is not universal.Debatepedia The Debatepedia portal's goal is to create an encyclopedia of debates which are organized as "pro" and "con" arguments.A list of 89 topics helps visitors to browse the debates.The debates are contributed by anonymous web users.Topics in Debatepedia tend to address issues of Western culture.For example, the topic "United States" covers 306 debates while "Third World" covers only 12.The site is no longer maintained, but accessible through the Wayback Machine. 7he three ontologies are publicly accessible, and two of them are actively maintained and updated.Acquiring the ontologies is straightforward-not straightforward is to make use of them.A key task associated with every topic ontology is to categorize a given document.Having just a short string label describing a (potentially multifaceted) topic, such as "The Great Reset", renders this task exceedingly difficult.Fortunately, domain experts have been pre-categorizing documents into the aforementioned ontologies.In particular, regarding the WEF, invited domain experts categorize news articles for every topic, regarding Wikipedia, the text of the associated articles is available, as are the associated debates on Debatepedia.
Articles that are categorized into Level 2 topics are propagated up to their respective Level 1 topics.Table 3 shows the large differences between the ontologies.The WEF ontology contains the most topics, links the most documents, and has the most tokens overall.Wikipedia's Level 2 topics link to a single article each, yielding less text overall.

Topic Coverage
To assess the topic coverage of an argument corpus given the three ontologies, we map their topic labels (if provided) to matching ontology topics.

Topic Label Normalization
Table 1 lists 39 argument corpora that provide topic labels.Altogether 2,259 different labels have been assigned.They are concise descriptions of the main issues of an argument provided by the corpus authors.The labels possess the text register of the respective corpus: In essays, for instance, topics are usually thesis statements, while Wikipediaderived corpora use article titles, and the topics of debate corpora include clichés such as "This house should".Often, topic labels express a stance towards a target issue, e.g., "ban guns".Five types of topic labels can be distinguished: concept, comparison of concepts, conclusion (includes claim and thesis), question, and imperative.We normalize the topic labels by converting all concepts to singular form, removing clichés, and dropping stanceindicating words such as "legalize".Our normalization aims at retaining only the central target issue of a topic label and leads to 798 unique topic labels.

Mapping Topic Labels to Ontology Topics
Using the preprocessed topic labels as queries, we retrieve for each topic label the 50 top-most relevant topics in each level of the three ontologies.
To facilitate the retrieval of ontology topics, we employ a BM25-weighted (Robertson et al., 2004) index of the concatenated documents for each topic.This enables us to narrow down the mapping of a topic label to a manageable size.Except for a handful of cases, 50 ontology topics can be retrieved for each topic label.The topic labels were then manually mapped to an ontology topic, if they form synonyms, or if the former is a subtopic of the latter-which thus indicates that all arguments in the corpus with that topic label are about the ontology topic.A topic label can thus be mapped to multiple ontology topics.For example, the topic label "plastic bottles" is mapped to "pollution" and "recycling" in Wikipedia Level 2.

Analysis of Topic Coverage
Table 3 shows general statistics of this mapping of corpora topic labels to ontology topics.Most of the topic labels (2,141 out of 2,259) are mapped to at least one Debatepedia topic while only 395 labels are mapped to WEF Level 2 topics.For Wikipedia Level 2, only 298 out of the 748 topics are actually covered by argument corpora.This first analysis already suggests that existing argument corpora often only cover a small subset of possible argumentative topics that people are trained to debate.For those topic labels that can be mapped, they belong on average to 2.78 topics in Debatepedia, 1.24 topics in Wikipedia Level 1, and 1.53 topics in WEF Level 1.As discussed in Section 4, topics in Debatepedia focus on the Western culture and are easily accessible, whereas topics in WEF require in-depth domain knowledge and have more global relevance.The broad coverage of Debatepedia's topics indicates that argument corpora focus on common, widely discussed topics rather than global issues or those that need domain knowledge.For a more fine-grained analysis, Figure 2 illustrates the differences regarding the number of ontology topics covered by a corpus: While topics in Wikipedia Level 1 are covered well by some argument corpora, topics in Wikipedia and WEF Level 2 are covered only marginally.Note that topic coverage varies significantly between the corpora: the Claim Sentence Search dataset's topics cover 93% of the Wikipedia Level 1 topics, while the Ideological Debates Reasons dataset covers only 14%.The colors show the topic granularity of the corpus; especially the Record Debating Dataset 3 dataset is fine-grained: as the highest value, 36 of its topics are mapped to the Wikipedia Level 1 category "Politics and Economics".
Figure 3 shows how the set of the units of the 39 labeled corpora distribute over the top-matching topics in Debatepedia, Wikipedia Level 1, and WEF Level 1. Distributions over Level 2 are omitted for brevity and can be found in Figure 4 in the Appendix.The distribution is significantly skewed: while the top ten topics in Debatepdia are matched by 354,811 to 138,407 corpora units, the top ten topics in WEF Level 1 are matched by 344,345 to 28,725 corpora units.This supports our finding that the corpora cover easily accessible topics (e.g., "Media and Entertainment" and "Society").

Unit Categorization
The previous analysis assesses argument corpora which contain topic labels.About a third of the argument corpora do not.As a heuristic step to assessing their topic coverage, we map the ontology topics for a unit (Table 1) in an argument corpus by treating the unit as a (long) query in a standard information retrieval setup, where ontology topics are the retrieval targets.The documents categorized into each topic have been concatenated and used as the topic's representation.Though the documents associated with a topic are not necessarily argumentative, they cover the salient topic aspects.Table 3: Statistics for each topic ontology level: for topics and topic documents (Section 4), Count of mapped topic labels of the analyzed corpora for each ontology level, Count of all covered ontology topics by the topic labels and the min, max, and mean count of covered ontology topics per topic label (Section 5), and the effectiveness of the approaches and baseline in unit categorization (in terms of precision, recall, and F 1 -score) (Section 6).
To retrieve topics for a corpus unit, we implement and evaluate the following approaches: Semantic Interpretation (SI) and SI with Text Embeddings (Text2vec-SI).The Semantic interpretation approach computes the semantic similarity of a unit and a topic as follows: it uses the cosine similarity of the TF-IDF vectors for the unit and the concatenated topic's documents.This corresponds to the semantic interpretation step that is at the core of the well-known ESA model (Gabrilovich and Markovitch, 2007).Text2vec-SI calculates the similarity of topics and corpus units using BERT embeddings (Devlin et al., 2018).Following common practice, we take the dimension-wise average of the word embeddings for all tokens in the text. 8e tried other embeddings and approaches that performed similarly.The results of these approaches can be found in the appendix.As a baseline, we implement a direct match approach, which assigns a unit an ontology topic if the topic's text appears in the unit text (ignoring case).
For evaluation, we collect 34,638 pooled query relevance judgments (0.53 inter-annotator agreement as per Krippendorff's α) on 104 randomly selected argument units as queries from 26 corpora.The annotation process is detailed in the Appendix.
Based on the similarity scores of the approaches, we derive Boolean labels that indicate whether a unit is or is not about one of the ontologies' topics using two policies.The threshold policy labels a unit as about a topic if their similarity is above a threshold θ.The top-k policy labels a unit as about a topic if the topic is among the top-k topics with the highest similarity to the unit.We report the parameter of the policy that achieved the highest F 1score on the pooled judgments for each approach.
Table 3 shows the results of this evaluation.The baseline produces different results across ontologies-it performs poorly for both the abstract topics in Wikipedia Level 1 and the specific topics in WEF Level 2. The semantic interpretation approach clearly outperforms the baseline for all ontologies in terms of the F 1 -score.The Text2vec-SI approach outperforms the baseline and the semantic interpretation on abstract topics (Wikipedia Level 1), but its effectiveness is below that of the semantic interpretation approach on the other ontology levels.

Conclusion
The computational argumentation community risks topic bias in its approaches if the representativeness of topics in future corpora is not ensured.Achieving topic coverage is complicated by the fact that the landscape of controversial topics has not yet been well explored, and that there are no widely accepted ontologies for argument topics.In this paper, we venture into this future by mapping the landscape of argument topics and making it accessible for corpus construction and experimental design.We have identified three authoritative sources of ontological knowledge related to argument topics that provide an initial foundation for The Argument Ontology (TAO).For each source ontology, we evaluate the topic coverage of 39 argument corpora labeled with topics by matching the labels with the topics of the ontologies.To evaluate the topic coverage of corpora without topic labels, we develop an approach to identify the ontology topics of an argumentative text and achieve an F 1 of 0.59.
Our analyses show that the topic coverage of existing argument corpora is both limited to a subset of the topics of the ontologies and skewed.Most topics that require expertise, such as mental health, philosophy, or international security, are treated only peripherally in argumentation corpora.Therefore, existing argumentation technologies are more suited to teaching people how to construct arguments in general than to helping them make decisions about such and similarly complex topics.For the development of robust argumentation technologies, corpora need to be carefully drawn from a specific domain to allow for reliable experiments and the development of generalizable classifiers.
Future work for further development of TAO consists mainly of further surveying the argument topic landscape and unifying the various available ontologies.In addition to "is part of" relationships between topics, other relationship types can also be considered to build an argument topic knowledge base.However, our first version of TAO and our analyses can already help in selecting arguments for future corpus construction and model training.

Limitations
The three topic ontologies we used to evaluate topic coverage of argument corpora are from authoritative sources.Nevertheless, they probably do not cover all possible controversial topics relevant to argumentation (e.g., topics concerning private life).A comprehensive coverage of controversial topics in breadth and depth will likely remain an unattainable goal.Moreover, unifying the three thematic ontologies into a standard ontology is still an open problem given the many possible interpretations and relationships between the topics.
Another limitation is the moderate effectiveness achieved by our approaches for categorizing argument units.This is the case due to the large collection of controversial topics (about 748 for Wikipedia).Future research can be improved by using the structure of the topic ontology and hierarchical classifiers.Furthermore, it is also unclear whether the topic dependence of argumentation approaches decreases with increasing corpus size.

Ethics Statement
Our goal is to investigate whether and to what extent existing argumentation corpora are topic biased.This serves to critically examine the state of the art.However, we by no means want to give the impression that previous corpus authors lack ambition or diligence.Rather, the opposite is the case.The number of corpora that have been created in the last decade shows that the community is aware of the fact that not all areas of the argumentation landscape have been covered yet, and is therefore doing its utmost to explore it further.In a dynamic and rapidly growing research field, standards are usually developed in parallel with contributions, not in advance.Our research may therefore contribute to the further standardization of the corpus linguistics of argumentation.
The manual annotation of arguments and topics was done by expert annotators of our research groups.They were compensated fairly under German law.No personal data was collected.Table 5: Performance of semantic interpretation approaches in human evaluation for each topic ontology level in terms of precision (P), recall (R), and F 1 -score (F) for the "aboutness" label.For methods other than the baselines the table shows the values for both the similarity threshold θ and rank k that lead to the highest F 1 -score respectively.The best F 1 -scores for each ontology level are marked bold.

Figure 2 :
Figure 2: Proportion of ontology topics covered by at least n corpus topics (per ontology level and per corpus).

Figure 5 :
Figure 5: Assessment interface for topic labeling.

Table 1 :
Survey of argument corpora indicating data source, unit granularity, and size in terms of units and topics (if authors remarked on it)

Table 2 :
Counts of the topic types in the 39 preprocessed corpora with examples and their normalized form.
Distribution of corpora units over the top matching topics in an ontology (39 labeled corpora).