Missing the Margins: A Systematic Literature Review on the Demographic Representativeness of LLMs

Indira Sen; Marlene Lutz; Elisa Rogers; David Garcia; Markus Strohmaier

doi:10.18653/v1/2025.findings-acl.1246

Missing the Margins: A Systematic Literature Review on the Demographic Representativeness of LLMs

Indira Sen, Marlene Lutz, Elisa Rogers, David Garcia, Markus Strohmaier

Abstract

Many applications of Large Language Models (LLMs) require them to either simulate people or offer personalized functionality, making the demographic representativeness of LLMs crucial for equitable utility. At the same time, we know little about the extent to which these models actually reflect the demographic attributes and behaviors of certain groups or populations, with conflicting findings in empirical research. To shed light on this debate, we review 211 papers on the demographic representativeness of LLMs. We find that while 29% of the studies report positive conclusions on the representativeness of LLMs, 30% of these do not evaluate LLMs across multiple demographic categories or within demographic subcategories. Another 35% and 47% of the papers concluding positively fail to specify these subcategories altogether for gender and race, respectively. Of the articles that do report subcategories, fewer than half include marginalized groups in their study. Finally, more than a third of the papers do not define the target population to whom their findings apply; of those that do define it either implicitly or explicitly, a large majority study only the U.S. Taken together, our findings suggest an inflated perception of LLM representativeness in the broader community. We recommend more precise evaluation methods and comprehensive documentation of demographic attributes to ensure the responsible use of LLMs for social applications.

Anthology ID:: 2025.findings-acl.1246
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 24263–24289
Language:
URL:: https://aclanthology.org/2025.findings-acl.1246/
DOI:: 10.18653/v1/2025.findings-acl.1246
Bibkey:
Cite (ACL):: Indira Sen, Marlene Lutz, Elisa Rogers, David Garcia, and Markus Strohmaier. 2025. Missing the Margins: A Systematic Literature Review on the Demographic Representativeness of LLMs. In Findings of the Association for Computational Linguistics: ACL 2025, pages 24263–24289, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Missing the Margins: A Systematic Literature Review on the Demographic Representativeness of LLMs (Sen et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.1246.pdf

PDF Cite Search Fix data