Scott Hellman


2023

pdf bib
Scalable and Explainable Automated Scoring for Open-Ended Constructed Response Math Word Problems
Scott Hellman | Alejandro Andrade | Kyle Habermehl
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)

Open-ended constructed response math word problems (“math plus text”, or MPT) are a powerful tool in the assessment of students’ abilities to engage in mathematical reasoning and creative thinking. Such problems ask the student to compute a value or construct an expression and then explain, potentially in prose, what steps they took and why they took them. MPT items can be scored against highly structured rubrics, and we develop a novel technique for the automated scoring of MPT items that leverages these rubrics to provide explainable scoring. We show that our approach can be trained automatically and performs well on a large dataset of 34,417 responses across 14 MPT items.

2020

pdf bib
Multiple Instance Learning for Content Feedback Localization without Annotation
Scott Hellman | William Murray | Adam Wiemerslage | Mark Rosenstein | Peter Foltz | Lee Becker | Marcia Derr
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

Automated Essay Scoring (AES) can be used to automatically generate holistic scores with reliability comparable to human scoring. In addition, AES systems can provide formative feedback to learners, typically at the essay level. In contrast, we are interested in providing feedback specialized to the content of the essay, and specifically for the content areas required by the rubric. A key objective is that the feedback should be localized alongside the relevant essay text. An important step in this process is determining where in the essay the rubric designated points and topics are discussed. A natural approach to this task is to train a classifier using manually annotated data; however, collecting such data is extremely resource intensive. Instead, we propose a method to predict these annotation spans without requiring any labeled annotation data. Our approach is to consider AES as a Multiple Instance Learning (MIL) task. We show that such models can both predict content scores and localize content by leveraging their sentence-level score predictions. This capability arises despite never having access to annotation training data. Implications are discussed for improving formative feedback and explainable AES models.