Hongli Li


2025

pdf bib
Comparing AI tools and Human Raters in Predicting Reading Item Difficulty
Hongli Li | Roula Aldib | Chad Marchong | Kevin Fan
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Works in Progress

This study compares AI tools and human raters in predicting the difficulty of reading comprehension items without response data. Predictions from AI models (ChatGPT, Gemini, Claude, and DeepSeek) and human raters are evaluated against empirical difficulty values derived from student responses. Findings will inform AI’s potential to support test development.