Mattes Ruckdeschel
2025
Just Read the Codebook! Make Use of Quality Codebooks in Zero-Shot Classification of Multilabel Frame Datasets
Mattes Ruckdeschel
Proceedings of the 31st International Conference on Computational Linguistics
The recent development of Large Language Models lowered the barrier to entry for using Natural Language Processing methods for various tasks in the related scientific field of Computational Social Science and has led to more scrutiny of their performance on complex datasets. While in many cases the costly fine-tuning of smaller Language Models outperforms LLMs, zero and few-shot approaches on consumer hardware have the potential to deepen interdisciplinary research efforts, whilst opening up NLP research to complex, niche datasets that are hard to classify. The great effort that is coding datasets comes with the benefit of concise instructions for how to code the data at hand. We investigate, whether highly specific, instructive codebooks created by social scientists in order to code text with a multitude of complex labels can improve zero-shot performance on (quantized) LLMs. Our findings show, that using the latest LLMs, zero-shot performance can improve by providing a codebook on two complex datasets with a total of four different topics and can outperform few-shot In-Context-Learning setups. The approach is equally or more token-efficient, and requires less hands-on engineering, making it particularly compelling for practical research.
2022
Boundary Detection and Categorization of Argument Aspects via Supervised Learning
Mattes Ruckdeschel
|
Gregor Wiedemann
Proceedings of the 9th Workshop on Argument Mining
Aspect-based argument mining (ABAM) is the task of automatic _detection_ and _categorization_ of argument aspects, i.e. the parts of an argumentative text that contain the issue-specific key rationale for its conclusion. From empirical data, overlapping but not congruent sets of aspect categories can be derived for different topics. So far, two supervised approaches to detect aspect boundaries, and a smaller number of unsupervised clustering approaches to categorize groups of similar aspects have been proposed. With this paper, we introduce the Argument Aspect Corpus (AAC) that contains token-level annotations of aspects in 3,547 argumentative sentences from three highly debated topics. This dataset enables both the supervised learning of boundaries and categorization of argument aspects. During the design of our annotation process, we noticed that it is not clear from the outset at which contextual unit aspects should be coded. We, thus, experiment with classification at the token, chunk, and sentence level granularity. Our finding is that the chunk level provides the most useful information for applications. At the same time, it produces the best performing results in our tested supervised learning setups.
Few-Shot Learning for Argument Aspects of the Nuclear Energy Debate
Lena Jurkschat
|
Gregor Wiedemann
|
Maximilian Heinrich
|
Mattes Ruckdeschel
|
Sunna Torge
Proceedings of the Thirteenth Language Resources and Evaluation Conference
We approach aspect-based argument mining as a supervised machine learning task to classify arguments into semantically coherent groups referring to the same defined aspect categories. As an exemplary use case, we introduce the Argument Aspect Corpus - Nuclear Energy that separates arguments about the topic of nuclear energy into nine major aspects. Since the collection of training data for further aspects and topics is costly, we investigate the potential for current transformer-based few-shot learning approaches to accurately classify argument aspects. The best approach is applied to a British newspaper corpus covering the debate on nuclear energy over the past 21 years. Our evaluation shows that a stable prediction of shares of argument aspects in this debate is feasible with 50 to 100 training samples per aspect. Moreover, we see signals for a clear shift in the public discourse in favor of nuclear energy in recent years. This revelation of changing patterns of pro and contra arguments related to certain aspects over time demonstrates the potential of supervised argument aspect detection for tracking issue-specific media discourses.