Don’t Score too Early! Evaluating Argument Mining Models on Incomplete Essays

Nils-Jonathan Schaller; Yuning Ding; Thorben Jansen; Andrea Horbach

doi:10.18653/v1/2025.bea-1.27

Don’t Score too Early! Evaluating Argument Mining Models on Incomplete Essays

Nils-Jonathan Schaller, Yuning Ding, Thorben Jansen, Andrea Horbach

Abstract

Students’ argumentative writing benefits from receiving automated feedback, particularly throughout the writing process. While Argument Mining (AM) technology shows promise for delivering automated feedback on argumentative structures, existing systems are frequently trained on completed essays, providing rich context information and raising concerns about their usefulness for offering writing support on incomplete texts during the writing process. This study evaluates the robustness of AM algorithms on artificially fragmented learner texts from two large-scale corpora of secondary school essays: the German DARIUS corpus and the English PERSUADE corpus. Our analysis reveals that token-level sequence-tagging methods, while highly effective on complete essays, suffer significantly when context is limited or misleading. Conversely, sentence-level classifiers maintain relative stability under such conditions. We show that deliberately training AM models on fragmented input substantially mitigates these context-related weaknesses, enabling AM systems to support dynamic educational writing scenarios better.

Anthology ID:: 2025.bea-1.27
Volume:: Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Ekaterina Kochmar, Bashar Alhafni, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yaneva, Zheng Yuan
Venues:: BEA | WS
SIG:: SIGEDU
Publisher:: Association for Computational Linguistics
Note:
Pages:: 345–355
Language:
URL:: https://aclanthology.org/2025.bea-1.27/
DOI:: 10.18653/v1/2025.bea-1.27
Bibkey:
Cite (ACL):: Nils-Jonathan Schaller, Yuning Ding, Thorben Jansen, and Andrea Horbach. 2025. Don’t Score too Early! Evaluating Argument Mining Models on Incomplete Essays. In Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025), pages 345–355, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Don’t Score too Early! Evaluating Argument Mining Models on Incomplete Essays (Schaller et al., BEA 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.bea-1.27.pdf

PDF Cite Search Fix data