Measuring Gender Bias in Word Embeddings across Domains and Discovering New Gender Bias Word Categories

Kaytlin Chaloner, Alfredo Maldonado


Abstract
Prior work has shown that word embeddings capture human stereotypes, including gender bias. However, there is a lack of studies testing the presence of specific gender bias categories in word embeddings across diverse domains. This paper aims to fill this gap by applying the WEAT bias detection method to four sets of word embeddings trained on corpora from four different domains: news, social networking, biomedical and a gender-balanced corpus extracted from Wikipedia (GAP). We find that some domains are definitely more prone to gender bias than others, and that the categories of gender bias present also vary for each set of word embeddings. We detect some gender bias in GAP. We also propose a simple but novel method for discovering new bias categories by clustering word embeddings. We validate this method through WEAT’s hypothesis testing mechanism and find it useful for expanding the relatively small set of well-known gender bias word categories commonly used in the literature.
Anthology ID:
W19-3804
Volume:
Proceedings of the First Workshop on Gender Bias in Natural Language Processing
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Marta R. Costa-jussà, Christian Hardmeier, Will Radford, Kellie Webster
Venue:
GeBNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
25–32
Language:
URL:
https://aclanthology.org/W19-3804
DOI:
10.18653/v1/W19-3804
Bibkey:
Cite (ACL):
Kaytlin Chaloner and Alfredo Maldonado. 2019. Measuring Gender Bias in Word Embeddings across Domains and Discovering New Gender Bias Word Categories. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pages 25–32, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Measuring Gender Bias in Word Embeddings across Domains and Discovering New Gender Bias Word Categories (Chaloner & Maldonado, GeBNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-3804.pdf
Code
 alfredomg/GeBNLP2019
Data
GAP Coreference Dataset