On Gender Biases in Offensive Language Classification Models

Sanjana Marcé, Adam Poliak


Abstract
We explore whether neural Natural Language Processing models trained to identify offensive language in tweets contain gender biases. We add historically gendered and gender ambiguous American names to an existing offensive language evaluation set to determine whether models? predictions are sensitive or robust to gendered names. While we see some evidence that these models might be prone to biased stereotypes that men use more offensive language than women, our results indicate that these models? binary predictions might not greatly change based upon gendered names.
Anthology ID:
2022.gebnlp-1.19
Volume:
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Month:
July
Year:
2022
Address:
Seattle, Washington
Editors:
Christian Hardmeier, Christine Basta, Marta R. Costa-jussà, Gabriel Stanovsky, Hila Gonen
Venue:
GeBNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
174–183
Language:
URL:
https://aclanthology.org/2022.gebnlp-1.19
DOI:
10.18653/v1/2022.gebnlp-1.19
Bibkey:
Cite (ACL):
Sanjana Marcé and Adam Poliak. 2022. On Gender Biases in Offensive Language Classification Models. In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 174–183, Seattle, Washington. Association for Computational Linguistics.
Cite (Informal):
On Gender Biases in Offensive Language Classification Models (Marcé & Poliak, GeBNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.gebnlp-1.19.pdf
Data
OLID