Enhancing Item Difficulty Prediction in Large-scale Assessment with Large Language Model

Mubarak Mojoyinola, Olasunkanmi James Kehinde, Judy Tang


Abstract
Field testing is a resource-intensive bottleneck in test development. This study applied an interpretable framework that leverages a Large Language Model (LLM) for structured feature extraction from TIMSS items. These features will train several classifiers, whose predictions will be explained using SHAP, providing actionable, diagnostic insights insights for item writers.
Anthology ID:
2025.aimecon-wip.27
Volume:
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Works in Progress
Month:
October
Year:
2025
Address:
Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States
Editors:
Joshua Wilson, Christopher Ormerod, Magdalen Beiting Parrish
Venue:
AIME-Con
SIG:
Publisher:
National Council on Measurement in Education (NCME)
Note:
Pages:
218–222
Language:
URL:
https://aclanthology.org/2025.aimecon-wip.27/
DOI:
Bibkey:
Cite (ACL):
Mubarak Mojoyinola, Olasunkanmi James Kehinde, and Judy Tang. 2025. Enhancing Item Difficulty Prediction in Large-scale Assessment with Large Language Model. In Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Works in Progress, pages 218–222, Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States. National Council on Measurement in Education (NCME).
Cite (Informal):
Enhancing Item Difficulty Prediction in Large-scale Assessment with Large Language Model (Mojoyinola et al., AIME-Con 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.aimecon-wip.27.pdf