Introducing MARB — A Dataset for Studying the Social Dimensions of Reporting Bias in Language Models

Tom Södahl Bladsjö; Ricardo Muñoz Sánchez

doi:10.18653/v1/2025.gebnlp-1.5

Introducing MARB — A Dataset for Studying the Social Dimensions of Reporting Bias in Language Models

Tom Södahl Bladsjö, Ricardo Muñoz Sánchez

Abstract

Reporting bias is the tendency for speakers to omit unnecessary or obvious information while mentioning things they consider relevant or surprising. In descriptions of people, reporting bias can manifest as a tendency to over report on attributes that deviate from the norm. While social bias in language models has garnered a lot of attention in recent years, a majority of the existing work equates “bias” with “stereotypes”. We suggest reporting bias as an alternative lens through which to study how social attitudes manifest in language models. We present the MARB dataset, a diagnostic dataset for studying the interaction between social bias and reporting bias in language models. We use MARB to evaluate the off-the-shelf behavior of both masked and autoregressive language models and find signs of reporting bias with regards to marginalized identities, mirroring that which can be found in human text. This effect is particularly pronounced when taking gender into account, demonstrating the importance of considering intersectionality when studying social phenomena like biases.

Anthology ID:: 2025.gebnlp-1.5
Volume:: Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Month:: August
Year:: 2025
Address:: Vienna, Austria
Editors:: Agnieszka Faleńska, Christine Basta, Marta Costa-jussà, Karolina Stańczak, Debora Nozza
Venues:: GeBNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 59–74
Language:
URL:: https://aclanthology.org/2025.gebnlp-1.5/
DOI:: 10.18653/v1/2025.gebnlp-1.5
Bibkey:
Cite (ACL):: Tom Södahl Bladsjö and Ricardo Muñoz Sánchez. 2025. Introducing MARB — A Dataset for Studying the Social Dimensions of Reporting Bias in Language Models. In Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 59–74, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Introducing MARB — A Dataset for Studying the Social Dimensions of Reporting Bias in Language Models (Södahl Bladsjö & Muñoz Sánchez, GeBNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.gebnlp-1.5.pdf

PDF Cite Search Fix data