Systematic Evaluation of Auto-Encoding and Large Language Model Representations for Capturing Author States and Traits

Khushboo Singh; Vasudha Varadarajan; Adithya V. Ganesan; August Håkan Nilsson; Nikita Soni; Syeda Mahwish; Pranav Chitale; Ryan Boyd; Lyle Ungar; Richard N. Rosenthal; H. Andrew Schwartz

doi:10.18653/v1/2025.findings-acl.971

Systematic Evaluation of Auto-Encoding and Large Language Model Representations for Capturing Author States and Traits

Khushboo Singh, Vasudha Varadarajan, Adithya V. Ganesan, August Håkan Nilsson, Nikita Soni, Syeda Mahwish, Pranav Chitale, Ryan L. Boyd, Lyle Ungar, Richard N. Rosenthal, H. Andrew Schwartz

Abstract

Large Language Models (LLMs) are increasingly used in human-centered applications, yet their ability to model diverse psychological constructs is not well understood. In this study, we systematically evaluate a range of Transformer-LMs to predict psychological variables across five major dimensions: affect, substance use, mental health, sociodemographics, and personality. Analyses span three temporal levels—short daily text responses about current affect, text aggregated over two-weeks, and user-level text collected over two years—allowing us to examine how each model’s strengths align with the underlying stability of different constructs. The findings show that mental health signals emerge as the most accurately predicted dimensions (r=0.6) across all temporal scales. At the daily scale, smaller models like DeBERTa and HaRT often performed better, whereas, at longer scales or with greater context, larger model like Llama3-8B performed the best. Also, aggregating text over the entire study period yielded stronger correlations for outcomes, such as age and income. Overall, these results suggest the importance of selecting appropriate model architectures and temporal aggregation techniques based on the stability and nature of the target variable.

Anthology ID:: 2025.findings-acl.971
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 18955–18973
Language:
URL:: https://aclanthology.org/2025.findings-acl.971/
DOI:: 10.18653/v1/2025.findings-acl.971
Bibkey:
Cite (ACL):: Khushboo Singh, Vasudha Varadarajan, Adithya V. Ganesan, August Håkan Nilsson, Nikita Soni, Syeda Mahwish, Pranav Chitale, Ryan L. Boyd, Lyle Ungar, Richard N. Rosenthal, and H. Andrew Schwartz. 2025. Systematic Evaluation of Auto-Encoding and Large Language Model Representations for Capturing Author States and Traits. In Findings of the Association for Computational Linguistics: ACL 2025, pages 18955–18973, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Systematic Evaluation of Auto-Encoding and Large Language Model Representations for Capturing Author States and Traits (Singh et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.971.pdf

PDF Cite Search Fix data