On Using Self-Report Studies to Analyze Language Models

Matúš Pikuliak


Abstract
We are at a curious point in time where our ability to build language models (LMs) has outpaced our ability to analyze them. We do not really know how to reliably determine their capabilities, biases, dangers, knowledge, and so on. The benchmarks we have are often overly specific, do not generalize well, and are susceptible to data leakage. Recently, I have noticed a trend of using self-report studies, such as various polls and questionnaires originally designed for humans, to analyze the properties of LMs. I think that this approach can easily lead to false results, which can be quite dangerous considering the current discussions on AI safety, governance, and regulation. To illustrate my point, I will delve deeper into several papers that employ self-report methodologies and I will try to highlight some of their weaknesses.
Anthology ID:
2024.nejlt-1.5
Volume:
Northern European Journal of Language Technology, Volume 10
Month:
December
Year:
2024
Address:
Linköping, Sweden
Editor:
Marcel Bollmann
Venue:
NEJLT
SIG:
Publisher:
Linköping University Electronic Press
Note:
Pages:
78–85
Language:
URL:
https://aclanthology.org/2024.nejlt-1.5/
DOI:
10.3384/nejlt.2000-1533.2024.5000
Bibkey:
Cite (ACL):
Matúš Pikuliak. 2024. On Using Self-Report Studies to Analyze Language Models. Northern European Journal of Language Technology, 10:78–85.
Cite (Informal):
On Using Self-Report Studies to Analyze Language Models (Pikuliak, NEJLT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nejlt-1.5.pdf