An Explainable Approach to Understanding Gender Stereotype Text

Manuela Jeyaraj, Sarah Delany


Abstract
Gender Stereotypes refer to the widely held beliefs and assumptions about the typical traits, behaviours, and roles associated with a collective group of individuals of a particular gender in society. These typical beliefs about how people of a particular gender are described in text can cause harmful effects to individuals leading to unfair treatment. In this research, the aim is to identify the words and language constructs that can influence a text to be considered a gender stereotype. To do so, a transformer model with attention is fine-tuned for gender stereotype detection. Thereafter, words/language constructs used for the model’s decision are identified using a combined use of attention- and SHAP (SHapley Additive exPlanations)-based explainable approaches. Results show that adjectives and verbs were highly influential in predicting gender stereotypes. Furthermore, applying sentiment analysis showed that words describing male gender stereotypes were more positive than those used for female gender stereotypes.
Anthology ID:
2024.gebnlp-1.4
Volume:
Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Agnieszka Faleńska, Christine Basta, Marta Costa-jussà, Seraphina Goldfarb-Tarrant, Debora Nozza
Venues:
GeBNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
45–59
Language:
URL:
https://aclanthology.org/2024.gebnlp-1.4
DOI:
Bibkey:
Cite (ACL):
Manuela Jeyaraj and Sarah Delany. 2024. An Explainable Approach to Understanding Gender Stereotype Text. In Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 45–59, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
An Explainable Approach to Understanding Gender Stereotype Text (Jeyaraj & Delany, GeBNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.gebnlp-1.4.pdf