Characterizing Human and Zero-Shot GPT-3.5 Object-Similarity Judgments

D McKnight, Alona Fyshe


Abstract
Recent advancements in large language models’ (LLMs) capabilities have yielded few-shot, human-comparable performance on a range of tasks. At the same time, researchers expend significant effort and resources gathering human annotations. At some point, LLMs may be able to perform some simple annotation tasks, but studies of LLM annotation accuracy and behavior are sparse. In this paper, we characterize OpenAI’s GPT-3.5’s judgment on a behavioral task for implicit object categorization. We characterize the embedding spaces of models trained on human vs. GPT responses and give similarities and differences between them, finding many similar dimensions. We also find that despite these similar dimensions, augmenting humans’ responses with GPT ones drives model divergence across the sizes of datasets tested.
Anthology ID:
2024.findings-naacl.242
Volume:
Findings of the Association for Computational Linguistics: NAACL 2024
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3810–3828
Language:
URL:
https://aclanthology.org/2024.findings-naacl.242
DOI:
Bibkey:
Cite (ACL):
D McKnight and Alona Fyshe. 2024. Characterizing Human and Zero-Shot GPT-3.5 Object-Similarity Judgments. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 3810–3828, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Characterizing Human and Zero-Shot GPT-3.5 Object-Similarity Judgments (McKnight & Fyshe, Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-naacl.242.pdf
Copyright:
 2024.findings-naacl.242.copyright.pdf