Is Machine Psychology here? On Requirements for Using Human Psychological Tests on Large Language Models

Lea Löhn, Niklas Kiehne, Alexander Ljapunov, Wolf-Tilo Balke


Abstract
In an effort to better understand the behavior of large language models (LLM), researchers recently turned to conducting psychological assessments on them. Several studies diagnose various psychological concepts in LLMs, such as psychopathological symptoms, personality traits, and intellectual functioning, aiming to unravel their black-box characteristics. But can we safely assess LLMs with tests that were originally designed for humans? The psychology domain looks back on decades of developing standards of appropriate testing procedures to ensure reliable and valid measures. We argue that analogous standardization processes are required for LLM assessments, given their differential functioning as compared to humans. In this paper, we propose seven requirements necessary for testing LLMs. Based on these, we critically reflect a sample of 25 recent machine psychology studies. Our analysis reveals (1) the lack of appropriate methods to assess test reliability and construct validity, (2) the unknown strength of construct-irrelevant influences, such as the contamination of pre-training corpora with test material, and (3) the pervasive issue of non-reproducibility of many studies. The results underscore the lack of a general methodology for the implementation of psychological assessments of LLMs and the need to redefine psychological constructs specifically for large language models rather than adopting them from human psychology.
Anthology ID:
2024.inlg-main.19
Volume:
Proceedings of the 17th International Natural Language Generation Conference
Month:
September
Year:
2024
Address:
Tokyo, Japan
Editors:
Saad Mahamood, Nguyen Le Minh, Daphne Ippolito
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
230–242
Language:
URL:
https://aclanthology.org/2024.inlg-main.19
DOI:
Bibkey:
Cite (ACL):
Lea Löhn, Niklas Kiehne, Alexander Ljapunov, and Wolf-Tilo Balke. 2024. Is Machine Psychology here? On Requirements for Using Human Psychological Tests on Large Language Models. In Proceedings of the 17th International Natural Language Generation Conference, pages 230–242, Tokyo, Japan. Association for Computational Linguistics.
Cite (Informal):
Is Machine Psychology here? On Requirements for Using Human Psychological Tests on Large Language Models (Löhn et al., INLG 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.inlg-main.19.pdf