Evelyn Johnson
2025
Simulating Rating Scale Responses with LLMs for Early-Stage Item Evaluation
Onur Demirkaya
|
Hsin-Ro Wei
|
Evelyn Johnson
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers
This study explores the use of large language models to simulate human responses to Likert-scale items. A DeBERTa-base model fine-tuned with item text and examinee ability emulates a graded response model (GRM). High alignment with GRM probabilities and reasonable threshold recovery support LLMs as scalable tools for early-stage item evaluation.
Predicting and Evaluating Item Responses Using Machine Learning, Text Embeddings, and LLMs
Evelyn Johnson
|
Hsin-Ro Wei
|
Tong Wu
|
Huan Liu
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Works in Progress
This work-in-progress study compares the accuracy of machine learning and large language models to predict student responses to field-test items on a social-emotional learning assessment. We evaluate how well each method replicates actual responses and examine the item parameters generated by synthetic data to those derived from actual student data.