Darkness can not drive out darkness: Investigating Bias in Hate SpeechDetection Models

Fatma Elsafoury


Abstract
It has become crucial to develop tools for automated hate speech and abuse detection. These tools would help to stop the bullies and the haters and provide a safer environment for individuals especially from marginalized groups to freely express themselves. However, recent research shows that machine learning models are biased and they might make the right decisions for the wrong reasons. In this thesis, I set out to understand the performance of hate speech and abuse detection models and the different biases that could influence them. I show that hate speech and abuse detection models are not only subject to social bias but also to other types of bias that have not been explored before. Finally, I investigate the causal effect of the social and intersectional bias on the performance and unfairness of hate speech detection models.
Anthology ID:
2022.acl-srw.4
Original:
2022.acl-srw.4v1
Version 2:
2022.acl-srw.4v2
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Samuel Louvan, Andrea Madotto, Brielen Madureira
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
31–43
Language:
URL:
https://aclanthology.org/2022.acl-srw.4
DOI:
Bibkey:
Cite (ACL):
Fatma Elsafoury. 2022. Darkness can not drive out darkness: Investigating Bias in Hate SpeechDetection Models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 31–43, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Darkness can not drive out darkness: Investigating Bias in Hate SpeechDetection Models (Elsafoury, ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-srw.4.pdf
Video:
 https://aclanthology.org/2022.acl-srw.4.mp4