Proceedings of the 1st Workshop on CounterSpeech for Online Abuse (CS4OA)

Yi-Ling Chung, Helena Bonaldi, Gavin Abercrombie, Marco Guerini (Editors)

Anthology ID:
Prague, Czechia
Association for Computational Linguistics
Bib Export formats:

pdf bib
From Generic to Personalized: Investigating Strategies for Generating Targeted Counter Narratives against Hate Speech
Mekselina Doğanç | Ilia Markov

The spread of hate speech (HS) in the digital age poses significant challenges, with online platforms becoming breeding grounds for harmful content. While many natural language processing (NLP) studies have focused on identifying hate speech, few have explored the generation of counter narratives (CNs) as means to combat it. Previous studies have shown that computational models often generate CNs that are dull and generic, and therefore do not resonate with hate speech authors. In this paper, we explore the personalization capabilities of computational models for generating more targeted and engaging CNs. This paper investigates various strategies for incorporating author profiling information into GPT-2 and GPT-3.5 models to enhance the personalization of CNs to combat online hate speech. We investigate the effectiveness of incorporating author profiling aspects, more specifically the age and gender information of HS authors, in tailoring CNs specifically targeted at HS spreaders. We discuss the challenges, opportunities, and future directions for incorporating user profiling information into CN interventions.

pdf bib
Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization
Helena Bonaldi | Giuseppe Attanasio | Debora Nozza | Marco Guerini

Recent computational approaches for combating online hate speech involve the automatic generation of counter narratives by adapting Pretrained Transformer-based Language Models (PLMs) with human-curated data. This process, however, can produce in-domain overfitting, resulting in models generating acceptable narratives only for hatred similar to training data, with little portability to other targets or to real-world toxic language. This paper introduces novel attention regularization methodologies to improve the generalization capabilities of PLMs for counter narratives generation. Overfitting to training-specific terms is then discouraged, resulting in more diverse and richer narratives. We experiment with two attention-based regularization techniques on a benchmark English dataset. Regularized models produce better counter narratives than state-of-the-art approaches in most cases, both in terms of automatic metrics and human evaluation, especially when hateful targets are not present in the training data. This work paves the way for better and more flexible counter-speech generation models, a task for which datasets are highly challenging to produce.

pdf bib
Distilling Implied Bias from Hate Speech for Counter Narrative Selection
Nami Akazawa | Serra Sinem Tekiroğlu | Marco Guerini

Hate speech is a critical problem in our society and social media platforms are often an amplifier for this phenomenon. Recently the use of Counter Narratives (informative and non-aggressive responses) has been proposed as a viable solution to counter hateful content that goes beyond simple detection-removal strategies. In this paper we present a novel approach along this line of research, which utilizes the implied statement (bias) expressed in the hate speech to retrieve an appropriate counter narrative. To this end, we first trained and tested several LMs that, given a hateful post, generate the underlying bias and the target group. Then, for the counter narrative selection task, we experimented with several methodologies that either use or not use the implied bias during the process. Experiments show that using the target group information allows the system to better focus on relevant content and that implied statement for selecting counter narratives is better than the corresponding standard approach that does not use it. To our knowledge, this is the first attempt to build an automatic selection tool that uses hate speech implied bias to drive Counter Narrative selection.

pdf bib
Just Collect, Don’t Filter: Noisy Labels Do Not Improve Counterspeech Collection for Languages Without Annotated Resources
Pauline Möhle | Matthias Orlikowski | Philipp Cimiano

Counterspeech on social media is rare. Consequently, it is difficult to collect naturally occurring examples, in particular for languages without annotated datasets. In this work, we study methods to increase the relevance of social media samples for counterspeech annotation when we lack annotated resources. We use the example of sourcing German data for counterspeech annotations from Twitter. We monitor tweets from German politicians and activists to collect replies. To select relevant replies we a) find replies that match German abusive keywords or b) label replies for counterspeech using a multilingual classifier fine-tuned on English data. For both approaches and a baseline setting, we annotate a random sample and use bootstrap sampling to estimate the amount of counterspeech. We find that neither the multilingual model nor the keyword approach achieve significantly higher counts of true counterspeech than the baseline. Thus, keyword lists or multi-lingual classifiers are likely not worth the added complexity beyond purposive data collection: Already without additional filtering, we gather a meaningful sample with 7,4% true counterspeech.

pdf bib
What Makes Good Counterspeech? A Comparison of Generation Approaches and Evaluation Metrics
Yi Zheng | Björn Ross | Walid Magdy

Counterspeech has been proposed as a solution to the proliferation of online hate. Research has shown that natural language processing (NLP) approaches could generate such counterspeech automatically, but there are competing ideas for how NLP models might be used for this task and a variety of evaluation metrics whose relationship to one another is unclear. We test three different approaches and collect ratings of the generated counterspeech for 1,740 tweet-participant pairs to systematically compare the counterspeech on three aspects: quality, effectiveness and user preferences. We examine which model performs best at which metric and which aspects of counterspeech predict user preferences. A free-form text generation approach using ChatGPT performs the most consistently well, though its generations are occasionally unspecific and repetitive. In our experiment, participants’ preferences for counterspeech are predicted by the quality of the counterspeech, not its perceived effectiveness. The results can help future research approach counterspeech evaluation more systematically.