Maximilian Spliethöver


2023

pdf bib
Claim Optimization in Computational Argumentation
Gabriella Skitalinskaya | Maximilian Spliethöver | Henning Wachsmuth
Proceedings of the 16th International Natural Language Generation Conference

An optimal delivery of arguments is key to persuasion in any debate, both for humans and for AI systems. This requires the use of clear and fluent claims relevant to the given debate. Prior work has studied the automatic assessment of argument quality extensively. Yet, no approach actually improves the quality so far. To fill this gap, this paper proposes the task of claim optimization: to rewrite argumentative claims in order to optimize their delivery. As multiple types of optimization are possible, we approach this task by first generating a diverse set of candidate claims using a large language model, such as BART, taking into account contextual information. Then, the best candidate is selected using various quality metrics. In automatic and human evaluation on an English-language corpus, our quality-based candidate selection outperforms several baselines, improving 60% of all claims (worsening 16% only). Follow-up analyses reveal that, beyond copy editing, our approach often specifies claims with details, whereas it adds less evidence than humans do. Moreover, its capabilities generalize well to other domains, such as instructional texts.

2022

pdf bib
To Prefer or to Choose? Generating Agency and Power Counterfactuals Jointly for Gender Bias Mitigation
Maja Stahl | Maximilian Spliethöver | Henning Wachsmuth
Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)

Gender bias may emerge from an unequal representation of agency and power, for example, by portraying women frequently as passive and powerless (“She accepted her future”) and men as proactive and powerful (“He chose his future”). When language models learn from respective texts, they may reproduce or even amplify the bias. An effective way to mitigate bias is to generate counterfactual sentences with opposite agency and power to the training. Recent work targeted agency-specific verbs from a lexicon to this end. We argue that this is insufficient, due to the interaction of agency and power and their dependence on context. In this paper, we thus develop a new rewriting model that identifies verbs with the desired agency and power in the context of the given sentence. The verbs’ probability is then boosted to encourage the model to rewrite both connotations jointly. According to automatic metrics, our model effectively controls for power while being competitive in agency to the state of the art. In our main evaluation, human annotators favored its counterfactuals in terms of both connotations, also deeming its meaning preservation better.

pdf bib
No Word Embedding Model Is Perfect: Evaluating the Representation Accuracy for Social Bias in the Media
Maximilian Spliethöver | Maximilian Keiff | Henning Wachsmuth
Findings of the Association for Computational Linguistics: EMNLP 2022

News articles both shape and reflect public opinion across the political spectrum. Analyzing them for social bias can thus provide valuable insights, such as prevailing stereotypes in society and the media, which are often adopted by NLP models trained on respective data. Recent work has relied on word embedding bias measures, such as WEAT. However, several representation issues of embeddings can harm the measures’ accuracy, including low-resource settings and token frequency differences. In this work, we study what kind of embedding algorithm serves best to accurately measure types of social bias known to exist in US online news articles. To cover the whole spectrum of political bias in the US, we collect 500k articles and review psychology literature with respect to expected social bias. We then quantify social bias using WEAT along with embedding algorithms that account for the aforementioned issues. We compare how models trained with the algorithms on news articles represent the expected social bias. Our results suggest that the standard way to quantify bias does not align well with knowledge from psychology. While the proposed algorithms reduce the gap, they still do not fully match the literature.

2021

pdf bib
Key Point Analysis via Contrastive Learning and Extractive Argument Summarization
Milad Alshomary | Timon Gurcke | Shahbaz Syed | Philipp Heinisch | Maximilian Spliethöver | Philipp Cimiano | Martin Potthast | Henning Wachsmuth
Proceedings of the 8th Workshop on Argument Mining

Key point analysis is the task of extracting a set of concise and high-level statements from a given collection of arguments, representing the gist of these arguments. This paper presents our proposed approach to the Key Point Analysis Shared Task, colocated with the 8th Workshop on Argument Mining. The approach integrates two complementary components. One component employs contrastive learning via a siamese neural network for matching arguments to key points; the other is a graph-based extractive summarization model for generating key points. In both automatic and manual evaluation, our approach was ranked best among all submissions to the shared task.

2020

pdf bib
Argument from Old Man’s View: Assessing Social Bias in Argumentation
Maximilian Spliethöver | Henning Wachsmuth
Proceedings of the 7th Workshop on Argument Mining

Social bias in language - towards genders, ethnicities, ages, and other social groups - poses a problem with ethical impact for many NLP applications. Recent research has shown that machine learning models trained on respective data may not only adopt, but even amplify the bias. So far, however, little attention has been paid to bias in computational argumentation. In this paper, we study the existence of social biases in large English debate portals. In particular, we train word embedding models on portal-specific corpora and systematically evaluate their bias using WEAT, an existing metric to measure bias in word embeddings. In a word co-occurrence analysis, we then investigate causes of bias. The results suggest that all tested debate corpora contain unbalanced and biased data, mostly in favor of male people with European-American names. Our empirical insights contribute towards an understanding of bias in argumentative data sources.

2019

pdf bib
Is It Worth the Attention? A Comparative Evaluation of Attention Layers for Argument Unit Segmentation
Maximilian Spliethöver | Jonas Klaff | Hendrik Heuer
Proceedings of the 6th Workshop on Argument Mining

Attention mechanisms have seen some success for natural language processing downstream tasks in recent years and generated new state-of-the-art results. A thorough evaluation of the attention mechanism for the task of Argumentation Mining is missing. With this paper, we report a comparative evaluation of attention layers in combination with a bidirectional long short-term memory network, which is the current state-of-the-art approach for the unit segmentation task. We also compare sentence-level contextualized word embeddings to pre-generated ones. Our findings suggest that for this task, the additional attention layer does not improve the performance. In most cases, contextualized embeddings do also not show an improvement on the score achieved by pre-defined embeddings.