Julia Romberg

2025

Towards a Perspectivist Turn in Argument Quality Assessment
Julia Romberg | Maximilian Maurer | Henning Wachsmuth | Gabriella Lapesa
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

The assessment of argument quality depends on well-established logical, rhetorical, and dialectical properties that are unavoidably subjective: multiple valid assessments may exist, there is no unequivocal ground truth. This aligns with recent paths in machine learning, which embrace the co-existence of different perspectives. However, this potential remains largely unexplored in NLP research on argument quality. One crucial reason seems to be the yet unexplored availability of suitable datasets. We fill this gap by conducting a systematic review of argument quality datasets. We assign them to a multi-layered categorization targeting two aspects: (a) What has been annotated: we collect the quality dimensions covered in datasets and consolidate them in an overarching taxonomy, increasing dataset comparability and interoperability. (b) Who annotated: we survey what information is given about annotators, enabling perspectivist research and grounding our recommendations for future actions. To this end, we discuss datasets suitable for developing perspectivist models (i.e., those containing individual, non-aggregated annotations), and we showcase the importance of a controlled selection of annotators in a pilot study.

2024

pdf bib abs

GESIS-DSM at PerpectiveArg2024: A Matter of Style? Socio-Cultural Differences in Argumentation
Maximilian Maurer | Julia Romberg | Myrthe Reuver | Negash Weldekiros | Gabriella Lapesa
Proceedings of the 11th Workshop on Argument Mining (ArgMining 2024)

This paper describes the contribution of team GESIS-DSM to the Perspective Argument Retrieval Task, a task on retrieving socio-culturally relevant and diverse arguments for different user queries. Our experiments and analyses aim to explore the nature of the socio-cultural specialization in argument retrieval: (how) do the arguments written by different socio-cultural groups differ? We investigate the impact of content and style for the task of identifying arguments relevant to a query and a certain demographic attribute. In its different configurations, our system employs sentence embedding representations, arguments generated with Large Language Model, as well as stylistic features. final method places third overall in the shared task, and, in comparison, does particularly well in the most difficult evaluation scenario, where the socio-cultural background of the argument author is implicit (i.e. has to be inferred from the text). This result indicates that socio-cultural differences in argument production may indeed be a matter of style.

2023

pdf bib

Proceedings of the 10th Workshop on Argument Mining
Milad Alshomary | Chung-Chi Chen | Smaranda Muresan | Joonsuk Park | Julia Romberg
Proceedings of the 10th Workshop on Argument Mining

pdf bib abs

A General Framework for Multimodal Argument Persuasiveness Classification of Tweets
Mohammad Soltani | Julia Romberg
Proceedings of the 10th Workshop on Argument Mining

An important property of argumentation concerns the degree of its persuasiveness, which can be influenced by various modalities. On social media platforms, individuals usually have the option of supporting their textual statements with images. The goals of the ImageArg shared task, held with ArgMining 2023, were therefore (A) to classify tweet stances considering both modalities and (B) to predict the influence of an image on the persuasiveness of a tweet text. In this paper, we present our proposed methodology that shows strong performance on both tasks, placing 3rd team on the leaderboard in each case with F1 scores of 0.8273 (A) and 0.5281 (B). The framework relies on pre-trained models to extract text and image features, which are then fed into a task-specific classification model. Our experiments highlighted that the multimodal vision and language model CLIP holds a specific importance in the extraction of features, in particular for task (A).

pdf bib abs

Architectural Sweet Spots for Modeling Human Label Variation by the Example of Argument Quality: It’s Best to Relate Perspectives!
Philipp Heinisch | Matthias Orlikowski | Julia Romberg | Philipp Cimiano
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Many annotation tasks in natural language processing are highly subjective in that there can be different valid and justified perspectives on what is a proper label for a given example. This also applies to the judgment of argument quality, where the assignment of a single ground truth is often questionable. At the same time, there are generally accepted concepts behind argumentation that form a common ground. To best represent the interplay of individual and shared perspectives, we consider a continuum of approaches ranging from models that fully aggregate perspectives into a majority label to “share nothing”-architectures in which each annotator is considered in isolation from all other annotators. In between these extremes, inspired by models used in the field of recommender systems, we investigate the extent to which architectures that predict labels for single annotators but include layers that model the relations between different annotators are beneficial. By means of two tasks of argument quality classification (argument concreteness and validity/novelty of conclusions), we show that recommender architectures increase the averaged annotator-individual F1-scores up to 43% over a majority-label model. Our findings indicate that approaches to subjectivity can benefit from relating individual perspectives.

pdf bib abs

Mind the User! Measures to More Accurately Evaluate the Practical Value of Active Learning Strategies
Julia Romberg
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

One solution to limited annotation budgets is active learning (AL), a collaborative process of human and machine to strategically select a small but informative set of examples. While current measures optimize AL from a pure machine learning perspective, we argue that for a successful transfer into practice, additional criteria must target the second pillar of AL, the human annotator. In text classification, e.g., where practitioners regularly encounter datasets with an increased number of imbalanced classes, measures like F1 fall short when finding all classes or identifying rare cases is required. We therefore introduce four measures that reflect class-related demands that users place on data acquisition. In a comprehensive comparison of uncertainty-based, diversity-based, and hybrid query strategies on six different datasets, we find that strong F1 performance is not necessarily associated with full class coverage. Uncertainty sampling outperforms diversity sampling in selecting minority classes and covering classes more efficiently, while diversity sampling excels in selecting less monotonous batches. Our empirical findings emphasize that a holistic view is essential when evaluating AL approaches to ensure their usefulness in practice - the actual, but often overlooked, goal of development. To this end, standard measures for assessing the performance of text classification need to be complemented by such that more appropriately reflect user needs.

2022

pdf bib abs

Is Your Perspective Also My Perspective? Enriching Prediction with Subjectivity
Julia Romberg
Proceedings of the 9th Workshop on Argument Mining

Although argumentation can be highly subjective, the common practice with supervised machine learning is to construct and learn from an aggregated ground truth formed from individual judgments by majority voting, averaging, or adjudication. This approach leads to a neglect of individual, but potentially important perspectives and in many cases cannot do justice to the subjective character of the tasks. One solution to this shortcoming are multi-perspective approaches, which have received very little attention in the field of argument mining so far. In this work we present PerspectifyMe, a method to incorporate perspectivism by enriching a task with subjectivity information from the data annotation process. We exemplify our approach with the use case of classifying argument concreteness, and provide first promising results for the recently published CIMT PartEval Argument Concreteness Corpus.

pdf bib abs

A Corpus of German Citizen Contributions in Mobility Planning: Supporting Evaluation Through Multidimensional Classification
Julia Romberg | Laura Mark | Tobias Escher
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Political authorities in democratic countries regularly consult the public in order to allow citizens to voice their ideas and concerns on specific issues. When trying to evaluate the (often large number of) contributions by the public in order to inform decision-making, authorities regularly face challenges due to restricted resources. We identify several tasks whose automated support can help in the evaluation of public participation. These are i) the recognition of arguments, more precisely premises and their conclusions, ii) the assessment of the concreteness of arguments, iii) the detection of textual descriptions of locations in order to assign citizens’ ideas to a spatial location, and iv) the thematic categorization of contributions. To enable future research efforts to develop techniques addressing these four tasks, we introduce the CIMT PartEval Corpus, a new publicly-available German-language corpus that includes several thousand citizen contributions from six mobility-related planning processes in five German municipalities. The corpus provides annotations for each of these tasks which have not been available in German for the domain of public participation before either at all or in this scope and variety.

2021

pdf bib abs

Citizen Involvement in Urban Planning - How Can Municipalities Be Supported in Evaluating Public Participation Processes for Mobility Transitions?
Julia Romberg | Stefan Conrad
Proceedings of the 8th Workshop on Argument Mining

Public participation processes allow citizens to engage in municipal decision-making processes by expressing their opinions on specific issues. Municipalities often only have limited resources to analyze a possibly large amount of textual contributions that need to be evaluated in a timely and detailed manner. Automated support for the evaluation is therefore essential, e.g. to analyze arguments. In this paper, we address (A) the identification of argumentative discourse units and (B) their classification as major position or premise in German public participation processes. The objective of our work is to make argument mining viable for use in municipalities. We compare different argument mining approaches and develop a generic model that can successfully detect argument structures in different datasets of mobility-related urban planning. We introduce a new data corpus comprising five public participation processes. In our evaluation, we achieve high macro F1 scores (0.76 - 0.80 for the identification of argumentative units; 0.86 - 0.93 for their classification) on all datasets. Additionally, we improve previous results for the classification of argumentative units on a similar German online participation dataset.

2020

pdf bib abs

Identifying patient information needs is an important issue for health care services and implementation of patient-centered care. A relevant number of people with diabetes mellitus experience a need for information during the course of the disease. Health-related online forums are a promising option for researching relevant information needs closely related to everyday life. In this paper, we present a novel data corpus comprising 4,664 contributions from an online diabetes forum in German language. Two annotation tasks were implemented. First, the contributions were categorised according to whether they contain a diabetes-specific information need or not, which might either be a non diabetes-specific information need or no information need at all, resulting in an agreement of 0.89 (Krippendorff’s α). Moreover, the textual content of diabetes-specific information needs was segmented and labeled using a well-founded definition of health-related information needs, which achieved a promising agreement of 0.82 (Krippendorff’s αu). We further report a baseline for two sub-tasks of the information extraction system planned for the long term: contribution categorization and segment classification.

2019

pdf bib abs

HHU at SemEval-2019 Task 6: Context Does Matter - Tackling Offensive Language Identification and Categorization with ELMo
Alexander Oberstrass | Julia Romberg | Anke Stoll | Stefan Conrad
Proceedings of the 13th International Workshop on Semantic Evaluation

We present our results for OffensEval: Identifying and Categorizing Offensive Language in Social Media (SemEval 2019 - Task 6). Our results show that context embeddings are important features for the three different sub-tasks in connection with classical machine and with deep learning. Our best model reached place 3 of 75 in sub-task B with a macro F₁ of 0.719. Our approaches for sub-task A and C perform less well but could also deliver promising results.

2017

pdf bib abs

HHU at SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Data using Machine Learning Methods
Tobias Cabanski | Julia Romberg | Stefan Conrad
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

In this Paper a system for solving SemEval-2017 Task 5 is presented. This task is divided into two tracks where the sentiment of microblog messages and news headlines has to be predicted. Since two submissions were allowed, two different machine learning methods were developed to solve this task, a support vector machine approach and a recurrent neural network approach. To feed in data for these approaches, different feature extraction methods are used, mainly word representations and lexica. The best submissions for both tracks are provided by the recurrent neural network which achieves a F1-score of 0.729 in track 1 and 0.702 in track 2.

Venues

Julia Romberg

2025

2024

2023

2022

2021

2020

2019

2017

Co-authors

Venues