Klara Dolos


2024

pdf bib
Unraveling the Dynamics of Semi-Supervised Hate Speech Detection: The Impact of Unlabeled Data Characteristics and Pseudo-Labeling Strategies
Florian Ludwig | Klara Dolos | Ana Alves-Pinto | Torsten Zesch
Findings of the Association for Computational Linguistics: EACL 2024

Despite advances in machine learning based hate speech detection, the need for larges amounts of labeled training data for state-of-the-art approaches remains a challenge for their application. Semi-supervised learning addresses this problem by leveraging unlabeled data and thus reducing the amount of annotated data required. Underlying this approach is the assumption that labeled and unlabeled data follow similar distributions. This assumption however may not always hold, with consequences for real world applications. We address this problem by investigating the dynamics of pseudo-labeling, a commonly employed form of semi-supervised learning, in the context of hate speech detection. Concretely we analysed the influence of data characteristics and of two strategies for selecting pseudo-labeled samples: threshold- and ratio-based. The results show that the influence of data characteristics on the pseudo-labeling performances depends on other factors, such as pseudo-label selection strategies or model biases. Furthermore, the effectiveness of pseudo-labeling in classification performance is determined by the interaction between the number, hate ratio and accuracy of the selected pseudo-labels. Analysis of the results suggests an advantage of the threshold-based approach when labeled and unlabeled data arise from the same domain, whilst the ratio-based approach may be recommended in the opposite situation.

2022

pdf bib
Improving Generalization of Hate Speech Detection Systems to Novel Target Groups via Domain Adaptation
Florian Ludwig | Klara Dolos | Torsten Zesch | Eleanor Hobley
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)

Despite recent advances in machine learning based hate speech detection, classifiers still struggle with generalizing knowledge to out-of-domain data samples. In this paper, we investigate the generalization capabilities of deep learning models to different target groups of hate speech under clean experimental settings. Furthermore, we assess the efficacy of three different strategies of unsupervised domain adaptation to improve these capabilities. Given the diversity of hate and its rapid dynamics in the online world (e.g. the evolution of new target groups like virologists during the COVID-19 pandemic), robustly detecting hate aimed at newly identified target groups is a highly relevant research question. We show that naively trained models suffer from a target group specific bias, which can be reduced via domain adaptation. We were able to achieve a relative improvement of the F1-score between 5.8% and 10.7% for out-of-domain target groups of hate speech compared to baseline approaches by utilizing domain adaptation.