Onur Demirkaya

2025

pdf bib abs
Simulating Rating Scale Responses with LLMs for Early-Stage Item Evaluation
Onur Demirkaya | Hsin-Ro Wei | Evelyn Johnson
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers

This study explores the use of large language models to simulate human responses to Likert-scale items. A DeBERTa-base model fine-tuned with item text and examinee ability emulates a graded response model (GRM). High alignment with GRM probabilities and reasonable threshold recovery support LLMs as scalable tools for early-stage item evaluation.

Co-authors

Evelyn Johnson 1
Hsin-Ro Wei 1

Venues

aimecon1

Fix author