D McKnight
2024
Characterizing Human and Zero-Shot GPT-3.5 Object-Similarity Judgments
D McKnight
|
Alona Fyshe
Findings of the Association for Computational Linguistics: NAACL 2024
Recent advancements in large language models’ (LLMs) capabilities have yielded few-shot, human-comparable performance on a range of tasks. At the same time, researchers expend significant effort and resources gathering human annotations. At some point, LLMs may be able to perform some simple annotation tasks, but studies of LLM annotation accuracy and behavior are sparse. In this paper, we characterize OpenAI’s GPT-3.5’s judgment on a behavioral task for implicit object categorization. We characterize the embedding spaces of models trained on human vs. GPT responses and give similarities and differences between them, finding many similar dimensions. We also find that despite these similar dimensions, augmenting humans’ responses with GPT ones drives model divergence across the sizes of datasets tested.
Search