Elena Catanese


2024

pdf bib
Towards a Hate Speech Index with Attention-based LSTMs and XLM-RoBERTa
Mauro Bruno | Elena Catanese | Francesco Ortame
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)

The uncontrolled diffusion of hate speech on social media requires robust detection mechanisms to measure its harmful impact. Analyzing texts from X (formerly Twitter) is challenging due to slang, neologisms, and sarcasm, which require advanced and intelligent detection approaches. While sophisticated models like large language models (LLMs) demonstrate impressive accuracy, their prohibitive inference times make it impractical to process millions of tweets. Therefore, we propose a mixed approach using a bidirectional long short-term memory model with an added attention mechanism (AT-BiLSTM) for improved natural language understanding. We benchmark this model against a standard BiLSTM model and a fine-tuned multilingual robustly optimized BERT (RoBERTa).The task of hate speech detection has been extensively explored in the EVALITA campaigns, which have achieved impressive results. Building on this foundation, we aim to develop a robust classifier to predict the content of approximately 20 million tweets related to immigration. The performance of our models is comparable to the top entries from the EVALITA campaigns, and we show the effects of training different networks on the dynamics of the Hate Speech Index (HSI). We also utilize a custom labeled dataset for benchmarking and training.