Detecting Math Misconceptions: An AI Benchmark Dataset

Bethany Rittle-Johnson, Rebecca Adler, Kelley Durkin, L Burleigh, Jules King, Scott Crossley


Abstract
To harness the promise of AI for improving math education, AI models need to be able to diagnose math misconceptions. We created an AI benchmark dataset on math misconceptions and other instructionally-relevant errors, comprising over 52,000 explanations written over 15 math questions that were scored by expert human raters.
Anthology ID:
2025.aimecon-wip.3
Volume:
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Works in Progress
Month:
October
Year:
2025
Address:
Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States
Editors:
Joshua Wilson, Christopher Ormerod, Magdalen Beiting Parrish
Venue:
AIME-Con
SIG:
Publisher:
National Council on Measurement in Education (NCME)
Note:
Pages:
20–24
Language:
URL:
https://aclanthology.org/2025.aimecon-wip.3/
DOI:
Bibkey:
Cite (ACL):
Bethany Rittle-Johnson, Rebecca Adler, Kelley Durkin, L Burleigh, Jules King, and Scott Crossley. 2025. Detecting Math Misconceptions: An AI Benchmark Dataset. In Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Works in Progress, pages 20–24, Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States. National Council on Measurement in Education (NCME).
Cite (Informal):
Detecting Math Misconceptions: An AI Benchmark Dataset (Rittle-Johnson et al., AIME-Con 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.aimecon-wip.3.pdf