Automatic Scoring of an Open-Response Measure of Advanced Mind-Reading Using Large Language Models

Yixiao Wang; Russel Dsouza; Robert Lee; Ian Apperly; Rory Devine; Sanne van der Kleij; Mark Lee

doi:10.18653/v1/2025.clpsych-1.7

Automatic Scoring of an Open-Response Measure of Advanced Mind-Reading Using Large Language Models

Yixiao Wang, Russel Dsouza, Robert Lee, Ian Apperly, Rory Devine, Sanne van der Kleij, Mark Lee

Abstract

A rigorous psychometric approach is crucial for the accurate measurement of mind-reading abilities. Traditional scoring methods for such tests, which involve lengthy free-text responses, require considerable time and human effort. This study investigates the use of large language models (LLMs) to automate the scoring of psychometric tests. Data were collected from participants aged 13 to 30 years and scored by trained human coders to establish a benchmark. We evaluated multiple LLMs against human assessments, exploring various prompting strate- gies to optimize performance and fine-tuning the models using a subset of the collected data to enhance accuracy. Our results demonstrate that LLMs can assess advanced mind-reading abilities with over 90% accuracy on average. Notably, in most test items, the LLMs achieved higher Kappa agreement with the lead coder than two trained human coders, highlighting their potential to reliably score open-response psychometric tests.

Anthology ID:: 2025.clpsych-1.7
Volume:: Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2025)
Month:: May
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Ayah Zirikly, Andrew Yates, Bart Desmet, Molly Ireland, Steven Bedrick, Sean MacAvaney, Kfir Bar, Yaakov Ophir
Venues:: CLPsych | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 79–89
Language:
URL:: https://aclanthology.org/2025.clpsych-1.7/
DOI:: 10.18653/v1/2025.clpsych-1.7
Bibkey:
Cite (ACL):: Yixiao Wang, Russel Dsouza, Robert Lee, Ian Apperly, Rory Devine, Sanne van der Kleij, and Mark Lee. 2025. Automatic Scoring of an Open-Response Measure of Advanced Mind-Reading Using Large Language Models. In Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2025), pages 79–89, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Automatic Scoring of an Open-Response Measure of Advanced Mind-Reading Using Large Language Models (Wang et al., CLPsych 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.clpsych-1.7.pdf

PDF Cite Search Fix data