Towards Reliable Generation of Clinical Chart Items: A Counterfactual Reasoning Approach with Large Language Models

Jiaxuan Li, Saed Rezayi, Peter Baldwin, Polina Harik, Victoria Yaneva


Abstract
This study explores GPT-4 for generating clinical chart items in medical education using three prompting strategies. Expert evaluations found many items usable or promising. The counterfactual approach enhanced novelty, and item quality improved with high-surprisal examples. This is the first investigation of LLMs for automated clinical chart item generation.
Anthology ID:
2025.aimecon-main.16
Volume:
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers
Month:
October
Year:
2025
Address:
Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States
Editors:
Joshua Wilson, Christopher Ormerod, Magdalen Beiting Parrish
Venue:
AIME-Con
SIG:
Publisher:
National Council on Measurement in Education (NCME)
Note:
Pages:
142–153
Language:
URL:
https://aclanthology.org/2025.aimecon-main.16/
DOI:
Bibkey:
Cite (ACL):
Jiaxuan Li, Saed Rezayi, Peter Baldwin, Polina Harik, and Victoria Yaneva. 2025. Towards Reliable Generation of Clinical Chart Items: A Counterfactual Reasoning Approach with Large Language Models. In Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers, pages 142–153, Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States. National Council on Measurement in Education (NCME).
Cite (Informal):
Towards Reliable Generation of Clinical Chart Items: A Counterfactual Reasoning Approach with Large Language Models (Li et al., AIME-Con 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.aimecon-main.16.pdf