From Entropy to Generalizability: Strengthening Automated Essay Scoring Reliability and Sustainability

Yi Gui


Abstract
Generalizability Theory with entropy-derived stratification optimized automated essay scoring reliability. A G-study decomposed variance across 14 encoders and 3 seeds; D-studies identified minimal ensembles achieving G ≥ 0.85. A hybrid of one medium and one small encoder with two seeds maximized dependability per compute cost. Stratification ensured uniform precision across
Anthology ID:
2025.aimecon-main.34
Volume:
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers
Month:
October
Year:
2025
Address:
Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States
Editors:
Joshua Wilson, Christopher Ormerod, Magdalen Beiting Parrish
Venue:
AIME-Con
SIG:
Publisher:
National Council on Measurement in Education (NCME)
Note:
Pages:
312–328
Language:
URL:
https://aclanthology.org/2025.aimecon-main.34/
DOI:
Bibkey:
Cite (ACL):
Yi Gui. 2025. From Entropy to Generalizability: Strengthening Automated Essay Scoring Reliability and Sustainability. In Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers, pages 312–328, Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States. National Council on Measurement in Education (NCME).
Cite (Informal):
From Entropy to Generalizability: Strengthening Automated Essay Scoring Reliability and Sustainability (Gui, AIME-Con 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.aimecon-main.34.pdf