Improving Generalization of Hate Speech Detection Systems to Novel Target Groups via Domain Adaptation

Florian Ludwig; Klara Dolos; Torsten Zesch; Eleanor Hobley

doi:10.18653/v1/2022.woah-1.4

Improving Generalization of Hate Speech Detection Systems to Novel Target Groups via Domain Adaptation

Florian Ludwig, Klara Dolos, Torsten Zesch, Eleanor Hobley

Abstract

Despite recent advances in machine learning based hate speech detection, classifiers still struggle with generalizing knowledge to out-of-domain data samples. In this paper, we investigate the generalization capabilities of deep learning models to different target groups of hate speech under clean experimental settings. Furthermore, we assess the efficacy of three different strategies of unsupervised domain adaptation to improve these capabilities. Given the diversity of hate and its rapid dynamics in the online world (e.g. the evolution of new target groups like virologists during the COVID-19 pandemic), robustly detecting hate aimed at newly identified target groups is a highly relevant research question. We show that naively trained models suffer from a target group specific bias, which can be reduced via domain adaptation. We were able to achieve a relative improvement of the F1-score between 5.8% and 10.7% for out-of-domain target groups of hate speech compared to baseline approaches by utilizing domain adaptation.

Anthology ID:: 2022.woah-1.4
Volume:: Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)
Month:: July
Year:: 2022
Address:: Seattle, Washington (Hybrid)
Editors:: Kanika Narang, Aida Mostafazadeh Davani, Lambert Mathias, Bertie Vidgen, Zeerak Talat
Venue:: WOAH
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 29–39
Language:
URL:: https://aclanthology.org/2022.woah-1.4/
DOI:: 10.18653/v1/2022.woah-1.4
Bibkey:
Cite (ACL):: Florian Ludwig, Klara Dolos, Torsten Zesch, and Eleanor Hobley. 2022. Improving Generalization of Hate Speech Detection Systems to Novel Target Groups via Domain Adaptation. In Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), pages 29–39, Seattle, Washington (Hybrid). Association for Computational Linguistics.
Cite (Informal):: Improving Generalization of Hate Speech Detection Systems to Novel Target Groups via Domain Adaptation (Ludwig et al., WOAH 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.woah-1.4.pdf
Video:: https://aclanthology.org/2022.woah-1.4.mp4
Data: HateXplain

PDF Cite Search Video Fix data