Manuela Jeyaraj
2024
An Explainable Approach to Understanding Gender Stereotype Text
Manuela Jeyaraj
|
Sarah Delany
Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Gender Stereotypes refer to the widely held beliefs and assumptions about the typical traits, behaviours, and roles associated with a collective group of individuals of a particular gender in society. These typical beliefs about how people of a particular gender are described in text can cause harmful effects to individuals leading to unfair treatment. In this research, the aim is to identify the words and language constructs that can influence a text to be considered a gender stereotype. To do so, a transformer model with attention is fine-tuned for gender stereotype detection. Thereafter, words/language constructs used for the model’s decision are identified using a combined use of attention- and SHAP (SHapley Additive exPlanations)-based explainable approaches. Results show that adjectives and verbs were highly influential in predicting gender stereotypes. Furthermore, applying sentiment analysis showed that words describing male gender stereotypes were more positive than those used for female gender stereotypes.