Onur Demirkaya
2025
Simulating Rating Scale Responses with LLMs for Early-Stage Item Evaluation
Onur Demirkaya
|
Hsin-Ro Wei
|
Evelyn Johnson
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers
This study explores the use of large language models to simulate human responses to Likert-scale items. A DeBERTa-base model fine-tuned with item text and examinee ability emulates a graded response model (GRM). High alignment with GRM probabilities and reasonable threshold recovery support LLMs as scalable tools for early-stage item evaluation.