Michael Heilig
2025
Toward Automated Evaluation of AI-Generated Item Drafts in Clinical Assessment
Tazin Afrin
|
Le An Ha
|
Victoria Yaneva
|
Keelan Evanini
|
Steven Go
|
Kristine DeRuchie
|
Michael Heilig
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers
This study examines the classification of AI-generated clinical multiple-choice questions drafts as “helpful” or “non-helpful” starting points. Expert judgments were analyzed, and multiple classifiers were evaluated—including feature-based models, fine-tuned transformers, and few-shot prompting with GPT-4. Our findings highlight the challenges and considerations for evaluation methods of AI-generated items in clinical test development.
Search
Fix author
Co-authors
- Tazin Afrin 1
- Kristine DeRuchie 1
- Keelan Evanini 1
- Steven Go 1
- Le An Ha 1
- show all...