AnthroSet: a Challenge Dataset for Anthropomorphic Language Detection

Dorielle Lonke, Jelke Bloem, Pia Sommerauer


Abstract
This paper addresses the challenge of detecting anthropomorphic language in AI research. We introduce AnthroSet, a novel dataset of 600 manually annotated utterances covering various linguistic structures. Through the evaluation of two current approaches for anthropomorphism and atypical animacy detection, we highlight the limitations of a masked language model approach, arising from masking constraints as well as increasingly anthropomorphizing AI-related terminology. Our findings underscore the need for more targeted methods and a robust definition of anthropomorphism.
Anthology ID:
2025.ommm-1.3
Volume:
Proceedings of Interdisciplinary Workshop on Observations of Misunderstood, Misguided and Malicious Use of Language Models
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Piotr Przybyła, Matthew Shardlow, Clara Colombatto, Nanna Inie
Venues:
OMMM | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
27–39
Language:
URL:
https://aclanthology.org/2025.ommm-1.3/
DOI:
Bibkey:
Cite (ACL):
Dorielle Lonke, Jelke Bloem, and Pia Sommerauer. 2025. AnthroSet: a Challenge Dataset for Anthropomorphic Language Detection. In Proceedings of Interdisciplinary Workshop on Observations of Misunderstood, Misguided and Malicious Use of Language Models, pages 27–39, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
AnthroSet: a Challenge Dataset for Anthropomorphic Language Detection (Lonke et al., OMMM 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ommm-1.3.pdf