Beyond the Hint: Using Self-Critique to Constrain LLM Feedback in Conversation-Based Assessment

Tyler Burleigh, Jenny Han, Kristen Dicerbo


Abstract
Large Language Models in Conversation-Based Assessment tend to provide inappropriate hints that compromise validity. We demonstrate that self-critique – a simple prompt engineering technique – effectively constrains this behavior.Through two studies using synthetic conversations and real-world high school math pilot data, self-critique reduced inappropriate hints by 90.7% and 24-75% respectively. Human experts validated ground truth labels while LLM judges enabled scale. This immediately deployable solution addresses the critical tension in intermediate-stakes assessment: maintaining student engagement while ensuring fair comparisons. Our findings show prompt engineering can meaningfully safeguard assessment integrity without model fine-tuning.
Anthology ID:
2025.aimecon-sessions.9
Volume:
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Coordinated Session Papers
Month:
October
Year:
2025
Address:
Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States
Editors:
Joshua Wilson, Christopher Ormerod, Magdalen Beiting Parrish
Venue:
AIME-Con
SIG:
Publisher:
National Council on Measurement in Education (NCME)
Note:
Pages:
79–85
Language:
URL:
https://aclanthology.org/2025.aimecon-sessions.9/
DOI:
Bibkey:
Cite (ACL):
Tyler Burleigh, Jenny Han, and Kristen Dicerbo. 2025. Beyond the Hint: Using Self-Critique to Constrain LLM Feedback in Conversation-Based Assessment. In Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Coordinated Session Papers, pages 79–85, Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States. National Council on Measurement in Education (NCME).
Cite (Informal):
Beyond the Hint: Using Self-Critique to Constrain LLM Feedback in Conversation-Based Assessment (Burleigh et al., AIME-Con 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.aimecon-sessions.9.pdf