Onur Demirkaya


2025

pdf bib
Simulating Rating Scale Responses with LLMs for Early-Stage Item Evaluation
Onur Demirkaya | Hsin-Ro Wei | Evelyn Johnson
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers

This study explores the use of large language models to simulate human responses to Likert-scale items. A DeBERTa-base model fine-tuned with item text and examinee ability emulates a graded response model (GRM). High alignment with GRM probabilities and reasonable threshold recovery support LLMs as scalable tools for early-stage item evaluation.