Mitigating Biases in Hate Speech Detection from A Causal Perspective

Zhehao Zhang, Jiaao Chen, Diyi Yang


Abstract
Nowadays, many hate speech detectors are built to automatically detect hateful content. However, their training sets are sometimes skewed towards certain stereotypes (e.g., race or religion-related). As a result, the detectors are prone to depend on some shortcuts for predictions. Previous works mainly focus on token-level analysis and heavily rely on human experts’ annotations to identify spurious correlations, which is not only costly but also incapable of discovering higher-level artifacts. In this work, we use grammar induction to find grammar patterns for hate speech and analyze this phenomenon from a causal perspective. Concretely, we categorize and verify different biases based on their spuriousness and influence on the model prediction. Then, we propose two mitigation approaches including Multi-Task Intervention and Data-Specific Intervention based on these confounders. Experiments conducted on 9 hate speech datasets demonstrate the effectiveness of our approaches.
Anthology ID:
2023.findings-emnlp.440
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6610–6625
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.440
DOI:
10.18653/v1/2023.findings-emnlp.440
Bibkey:
Cite (ACL):
Zhehao Zhang, Jiaao Chen, and Diyi Yang. 2023. Mitigating Biases in Hate Speech Detection from A Causal Perspective. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 6610–6625, Singapore. Association for Computational Linguistics.
Cite (Informal):
Mitigating Biases in Hate Speech Detection from A Causal Perspective (Zhang et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.440.pdf