When Does Active Learning Actually Help? Empirical Insights with Transformer-based Automated Scoring

Justin O Barber; Michael P. Hemenway; Edward Wolfe

When Does Active Learning Actually Help? Empirical Insights with Transformer-based Automated Scoring

Justin O Barber, Michael P. Hemenway, Edward Wolfe

Abstract

Developing automated essay scoring (AES) systems typically demands extensive human annotation, incurring significant costs and requiring considerable time. Active learning (AL) methods aim to alleviate this challenge by strategically selecting the most informative essays for scoring, thereby potentially reducing annotation requirements without compromising model accuracy. This study systematically evaluates four prominent AL strategies—uncertainty sampling, BatchBALD, BADGE, and a novel GenAI-based uncertainty approach—against a random sampling baseline, using DeBERTa-based regression models across multiple assessment prompts exhibiting varying degrees of human scorer agreement. Contrary to initial expectations, we found that AL methods provided modest but meaningful improvements only for prompts characterized by poor scorer reliability (<60% agreement per score point). Notably, extensive hyperparameter optimization alone substantially reduced the annotation budget required to achieve near-optimal scoring performance, even with random sampling. Our findings underscore that while targeted AL methods can be beneficial in contexts of low scorer reliability, rigorous hyperparameter tuning remains a foundational and highly effective strategy for minimizing annotation costs in AES system development.

Anthology ID:: 2025.aimecon-sessions.1
Volume:: Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Coordinated Session Papers
Month:: October
Year:: 2025
Address:: Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States
Editors:: Joshua Wilson, Christopher Ormerod, Magdalen Beiting Parrish
Venue:: AIME-Con
SIG:
Publisher:: National Council on Measurement in Education (NCME)
Note:
Pages:: 1–8
Language:
URL:: https://aclanthology.org/2025.aimecon-sessions.1/
DOI:
Bibkey:
Cite (ACL):: Justin O Barber, Michael P. Hemenway, and Edward Wolfe. 2025. When Does Active Learning Actually Help? Empirical Insights with Transformer-based Automated Scoring. In Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Coordinated Session Papers, pages 1–8, Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States. National Council on Measurement in Education (NCME).
Cite (Informal):: When Does Active Learning Actually Help? Empirical Insights with Transformer-based Automated Scoring (Barber et al., AIME-Con 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.aimecon-sessions.1.pdf

PDF Cite Search Fix data