Assessing Gender Bias in Wikipedia: Inequalities in Article Titles

Agnieszka Falenska, Özlem Çetinoğlu


Abstract
Potential gender biases existing in Wikipedia’s content can contribute to biased behaviors in a variety of downstream NLP systems. Yet, efforts in understanding what inequalities in portraying women and men occur in Wikipedia focused so far only on *biographies*, leaving open the question of how often such harmful patterns occur in other topics. In this paper, we investigate gender-related asymmetries in Wikipedia titles from *all domains*. We assess that for only half of gender-related articles, i.e., articles with words such as *women* or *male* in their titles, symmetrical counterparts describing the same concept for the other gender (and clearly stating it in their titles) exist. Among the remaining imbalanced cases, the vast majority of articles concern sports- and social-related issues. We provide insights on how such asymmetries can influence other Wikipedia components and propose steps towards reducing the frequency of observed patterns.
Anthology ID:
2021.gebnlp-1.9
Volume:
Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing
Month:
August
Year:
2021
Address:
Online
Editors:
Marta Costa-jussa, Hila Gonen, Christian Hardmeier, Kellie Webster
Venue:
GeBNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
75–85
Language:
URL:
https://aclanthology.org/2021.gebnlp-1.9
DOI:
10.18653/v1/2021.gebnlp-1.9
Bibkey:
Cite (ACL):
Agnieszka Falenska and Özlem Çetinoğlu. 2021. Assessing Gender Bias in Wikipedia: Inequalities in Article Titles. In Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing, pages 75–85, Online. Association for Computational Linguistics.
Cite (Informal):
Assessing Gender Bias in Wikipedia: Inequalities in Article Titles (Falenska & Çetinoğlu, GeBNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.gebnlp-1.9.pdf
Code
 agnieszkafalenska/gebnlp2021
Data
GAP Coreference Dataset