Thorben Jansen

2025

Don’t Score too Early! Evaluating Argument Mining Models on Incomplete Essays
Nils-Jonathan Schaller | Yuning Ding | Thorben Jansen | Andrea Horbach
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)

Students’ argumentative writing benefits from receiving automated feedback, particularly throughout the writing process. While Argument Mining (AM) technology shows promise for delivering automated feedback on argumentative structures, existing systems are frequently trained on completed essays, providing rich context information and raising concerns about their usefulness for offering writing support on incomplete texts during the writing process. This study evaluates the robustness of AM algorithms on artificially fragmented learner texts from two large-scale corpora of secondary school essays: the German DARIUS corpus and the English PERSUADE corpus. Our analysis reveals that token-level sequence-tagging methods, while highly effective on complete essays, suffer significantly when context is limited or misleading. Conversely, sentence-level classifiers maintain relative stability under such conditions. We show that deliberately training AM models on fragmented input substantially mitigates these context-related weaknesses, enabling AM systems to support dynamic educational writing scenarios better.

2024

pdf bib abs

Fairness in Automated Essay Scoring: A Comparative Analysis of Algorithms on German Learner Essays from Secondary Education
Nils-Jonathan Schaller | Yuning Ding | Andrea Horbach | Jennifer Meyer | Thorben Jansen
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

Pursuing educational equity, particularly in writing instruction, requires that all students receive fair (i.e., accurate and unbiased) assessment and feedback on their texts. Automated Essay Scoring (AES) algorithms have so far focused on optimizing the mean accuracy of their scores and paid less attention to fair scores for all subgroups, although research shows that students receive unfair scores on their essays in relation to demographic variables, which in turn are related to their writing competence. We add to the literature arguing that AES should also optimize for fairness by presenting insights on the fairness of scoring algorithms on a corpus of learner texts in the German language and introduce the novelty of examining fairness on psychological and demographic differences in addition to demographic differences. We compare shallow learning, deep learning, and large language models with full and skewed subsets of training data to investigate what is needed for fair scoring. The results show that training on a skewed subset of higher and lower cognitive ability students shows no bias but very low accuracy for students outside the training set. Our results highlight the need for specific training data on all relevant user groups, not only for demographic background variables but also for cognitive abilities as psychological student characteristics.

pdf bib abs

Transfer Learning of Argument Mining in Student Essays
Yuning Ding | Julian Lohmann | Nils-Jonathan Schaller | Thorben Jansen | Andrea Horbach
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

This paper explores the transferability of a cross-prompt argument mining model trained on argumentative essays authored by native English-speaking learners (EN-L1) across educational contexts and languages. Specifically, the adaptability of a multilingual transformer model is assessed through its application to comparable argumentative essays authored by English-as-a-foreign-language learners (EN-L2) for context transfer, and a dataset composed of essays written by native German learners (DE) for both language and task transfer. To separate language effects from educational context effects, we also perform experiments on a machine-translated version of the German dataset (DE-MT). Our findings demonstrate that, even under zero-shot conditions, a model trained on native English speakers exhibits satisfactory performance on the EN-L2/DE datasets. Machine translation does not substantially enhance this performance, suggesting that distinct writing styles across educational contexts impact performance more than language differences.

pdf bib abs

DARIUS: A Comprehensive Learner Corpus for Argument Mining in German-Language Essays
Nils-Jonathan Schaller | Andrea Horbach | Lars Ingver Höft | Yuning Ding | Jan Luca Bahr | Jennifer Meyer | Thorben Jansen
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this paper, we present the DARIUS (Digital Argumentation Instruction for Science) corpus for argumentation quality on 4589 essays written by 1839 German secondary school students. The corpus is annotated according to a fine-grained annotation scheme, ranging from a broader perspective like content zones, to more granular features like argumentation coverage/reach and argumentative discourse units like claims and warrants. The features have inter-annotator agreements up to 0.83 Krippendorff’s α. The corpus and dataset are publicly available for further research in argument mining.

Thorben Jansen

2025

2024

2022

Co-authors

Venues