Which Demographics do LLMs Default to During Annotation?

Johannes Schäfer; Aidan Combs; Christopher Bagdon; Jiahui Li; Nadine Probol; Lynn Greschner; Sean Papay; Yarik Menchaca Resendiz; Aswathy Velutharambath; Amelie Wührl; Sabine Weber; Roman Klinger

doi:10.18653/v1/2025.acl-long.848

Which Demographics do LLMs Default to During Annotation?

Johannes Schäfer, Aidan Combs, Christopher Bagdon, Jiahui Li, Nadine Probol, Lynn Greschner, Sean Papay, Yarik Menchaca Resendiz, Aswathy Velutharambath, Amelie Wuehrl, Sabine Weber, Roman Klinger

Abstract

Demographics and cultural background of annotators influence the labels they assign in text annotation – for instance, an elderly woman might find it offensive to read a message addressed to a “bro”, but a male teenager might find it appropriate. It is therefore important to acknowledge label variations to not under-represent members of a society. Two research directions developed out of this observation in the context of using large language models (LLM) for data annotations, namely (1) studying biases and inherent knowledge of LLMs and (2) injecting diversity in the output by manipulating the prompt with demographic information. We combine these two strands of research and ask the question to which demographics an LLM resorts to when no demographics is given. To answer this question, we evaluate which attributes of human annotators LLMs inherently mimic. Furthermore, we compare non-demographic conditioned prompts and placebo-conditioned prompts (e.g., “you are an annotator who lives in house number 5”) to demographics-conditioned prompts (“You are a 45 year old man and an expert on politeness annotation. How do you rate instance”). We study these questions for politeness and offensiveness annotations on the POPQUORN data set, a corpus created in a controlled manner to investigate human label variations based on demographics which has not been used for LLM-based analyses so far. We observe notable influences related to gender, race, and age in demographic prompting, which contrasts with previous studies that found no such effects.

Anthology ID:: 2025.acl-long.848
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 17331–17348
Language:
URL:: https://aclanthology.org/2025.acl-long.848/
DOI:: 10.18653/v1/2025.acl-long.848
Bibkey:
Cite (ACL):: Johannes Schäfer, Aidan Combs, Christopher Bagdon, Jiahui Li, Nadine Probol, Lynn Greschner, Sean Papay, Yarik Menchaca Resendiz, Aswathy Velutharambath, Amelie Wuehrl, Sabine Weber, and Roman Klinger. 2025. Which Demographics do LLMs Default to During Annotation?. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 17331–17348, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Which Demographics do LLMs Default to During Annotation? (Schäfer et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.848.pdf

PDF Cite Search Fix data