Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models

Myra Cheng; Esin Durmus; Dan Jurafsky

doi:10.18653/v1/2023.acl-long.84

Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models

Abstract

To recognize and mitigate harms from large language models (LLMs), we need to understand the prevalence and nuances of stereotypes in LLM outputs. Toward this end, we present Marked Personas, a prompt-based method to measure stereotypes in LLMs for intersectional demographic groups without any lexicon or data labeling. Grounded in the sociolinguistic concept of markedness (which characterizes explicitly linguistically marked categories versus unmarked defaults), our proposed method is twofold: 1) prompting an LLM to generate personas, i.e., natural language descriptions, of the target demographic group alongside personas of unmarked, default groups; 2) identifying the words that significantly distinguish personas of the target group from corresponding unmarked ones. We find that the portrayals generated by GPT-3.5 and GPT-4 contain higher rates of racial stereotypes than human-written portrayals using the same prompts. The words distinguishing personas of marked (non-white, non-male) groups reflect patterns of othering and exoticizing these demographics. An intersectional lens further reveals tropes that dominate portrayals of marginalized groups, such as tropicalism and the hypersexualization of minoritized women. These representational harms have concerning implications for downstream applications like story generation.

Anthology ID:: 2023.acl-long.84
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1504–1532
Language:
URL:: https://aclanthology.org/2023.acl-long.84
DOI:: 10.18653/v1/2023.acl-long.84
Award:: Social Impact Award
Bibkey:
Cite (ACL):: Myra Cheng, Esin Durmus, and Dan Jurafsky. 2023. Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1504–1532, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models (Cheng et al., ACL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.acl-long.84.pdf
Video:: https://aclanthology.org/2023.acl-long.84.mp4

PDF Cite Search Video