On the Reliability of Psychological Scales on Large Language Models

Jen-tse Huang; Wenxiang Jiao; Man Ho Lam; Eric John Li; Wenxuan Wang; Michael Lyu

doi:10.18653/v1/2024.emnlp-main.354

On the Reliability of Psychological Scales on Large Language Models

Jen-tse Huang, Wenxiang Jiao, Man Ho Lam, Eric John Li, Wenxuan Wang, Michael Lyu

Abstract

Recent research has focused on examining Large Language Models’ (LLMs) characteristics from a psychological standpoint, acknowledging the necessity of understanding their behavioral characteristics. The administration of personality tests to LLMs has emerged as a noteworthy area in this context. However, the suitability of employing psychological scales, initially devised for humans, on LLMs is a matter of ongoing debate. Our study aims to determine the reliability of applying personality assessments to LLMs, explicitly investigating whether LLMs demonstrate consistent personality traits. Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory, indicating a satisfactory level of reliability. Furthermore, our research explores the potential of GPT-3.5 to emulate diverse personalities and represent various groups—a capability increasingly sought after in social sciences for substituting human participants with LLMs to reduce costs. Our findings reveal that LLMs have the potential to represent different personalities with specific prompt instructions.

Anthology ID:: 2024.emnlp-main.354
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6152–6173
Language:
URL:: https://aclanthology.org/2024.emnlp-main.354/
DOI:: 10.18653/v1/2024.emnlp-main.354
Bibkey:
Cite (ACL):: Jen-tse Huang, Wenxiang Jiao, Man Ho Lam, Eric John Li, Wenxuan Wang, and Michael Lyu. 2024. On the Reliability of Psychological Scales on Large Language Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 6152–6173, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: On the Reliability of Psychological Scales on Large Language Models (Huang et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-main.354.pdf
Data:: 2024.emnlp-main.354.data.zip

PDF Cite Search Data Fix data