Quantifying Data Contamination in Psychometric Evaluations of LLMs

Jongwook Han; Woojung Song; Jonggeun Lee; Yohan Jo

Quantifying Data Contamination in Psychometric Evaluations of LLMs

Jongwook Han, Woojung Song, Jonggeun Lee, Yohan Jo

Abstract

Recent studies apply psychometric questionnaires to Large Language Models (LLMs) to assess high-level psychological constructs such as values, personality, moral foundations, and dark traits. Although prior work has raised concerns about possible data contamination from psychometric inventories, which may threaten the reliability of such evaluations, there has been no systematic attempt to quantify the extent of this contamination. To address this gap, we propose a framework to systematically measure data contamination in psychometric evaluations of LLMs, evaluating three aspects: (1) item memorization, (2) evaluation memorization, and (3) target score matching. Applying this framework to 21 models from major families and four widely used psychometric inventories, we provide evidence that popular inventories such as the Big Five Inventory (BFI-44) and Portrait Values Questionnaire (PVQ-40) exhibit strong contamination, where models not only memorize items but can also adjust their responses to achieve specific target scores.

Anthology ID:: 2026.findings-eacl.319
Volume:: Findings of the Association for Computational Linguistics: EACL 2026
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6070–6088
Language:
URL:: https://aclanthology.org/2026.findings-eacl.319/
DOI:
Bibkey:
Cite (ACL):: Jongwook Han, Woojung Song, Jonggeun Lee, and Yohan Jo. 2026. Quantifying Data Contamination in Psychometric Evaluations of LLMs. In Findings of the Association for Computational Linguistics: EACL 2026, pages 6070–6088, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Quantifying Data Contamination in Psychometric Evaluations of LLMs (Han et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-eacl.319.pdf
Checklist:: 2026.findings-eacl.319.checklist.pdf

PDF Cite Search Checklist Fix data