Socially Responsible Hate Speech Detection: Can Classifiers Reflect Social Stereotypes?

Francielle Vargas, Isabelle Carvalho, Ali Hürriyetoğlu, Thiago Pardo, Fabrício Benevenuto


Abstract
Recent studies have shown that hate speech technologies may propagate social stereotypes against marginalized groups. Nevertheless, there has been a lack of realistic approaches to assess and mitigate biased technologies. In this paper, we introduce a new approach to analyze the potential of hate-speech classifiers to reflect social stereotypes through the investigation of stereotypical beliefs by contrasting them with counter-stereotypes. We empirically measure the distribution of stereotypical beliefs by analyzing the distinctive classification of tuples containing stereotypes versus counter-stereotypes in machine learning models and datasets. Experiment results show that hate speech classifiers attribute unreal or negligent offensiveness to social identity groups by reflecting and reinforcing stereotypical beliefs regarding minorities. Furthermore, we also found that models that embed expert and context information from offensiveness markers present promising results to mitigate social stereotype bias towards socially responsible hate speech detection.
Anthology ID:
2023.ranlp-1.126
Volume:
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
1187–1196
Language:
URL:
https://aclanthology.org/2023.ranlp-1.126
DOI:
Bibkey:
Cite (ACL):
Francielle Vargas, Isabelle Carvalho, Ali Hürriyetoğlu, Thiago Pardo, and Fabrício Benevenuto. 2023. Socially Responsible Hate Speech Detection: Can Classifiers Reflect Social Stereotypes?. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 1187–1196, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Socially Responsible Hate Speech Detection: Can Classifiers Reflect Social Stereotypes? (Vargas et al., RANLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.ranlp-1.126.pdf