Towards a Hate Speech Index with Attention-based LSTMs and XLM-RoBERTa

Mauro Bruno; Elena Catanese; Francesco Ortame

Towards a Hate Speech Index with Attention-based LSTMs and XLM-RoBERTa

Mauro Bruno, Elena Catanese, Francesco Ortame

Abstract

The uncontrolled diffusion of hate speech on social media requires robust detection mechanisms to measure its harmful impact. Analyzing texts from X (formerly Twitter) is challenging due to slang, neologisms, and sarcasm, which require advanced and intelligent detection approaches. While sophisticated models like large language models (LLMs) demonstrate impressive accuracy, their prohibitive inference times make it impractical to process millions of tweets. Therefore, we propose a mixed approach using a bidirectional long short-term memory model with an added attention mechanism (AT-BiLSTM) for improved natural language understanding. We benchmark this model against a standard BiLSTM model and a fine-tuned multilingual robustly optimized BERT (RoBERTa).The task of hate speech detection has been extensively explored in the EVALITA campaigns, which have achieved impressive results. Building on this foundation, we aim to develop a robust classifier to predict the content of approximately 20 million tweets related to immigration. The performance of our models is comparable to the top entries from the EVALITA campaigns, and we show the effects of training different networks on the dynamics of the Hate Speech Index (HSI). We also utilize a custom labeled dataset for benchmarking and training.

Anthology ID:: 2024.clicit-1.14
Volume:: Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:: December
Year:: 2024
Address:: Pisa, Italy
Editors:: Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:: CLiC-it
SIG:
Publisher:: CEUR Workshop Proceedings
Note:
Pages:: 106–113
Language:
URL:: https://aclanthology.org/2024.clicit-1.14/
DOI:
Bibkey:
Cite (ACL):: Mauro Bruno, Elena Catanese, and Francesco Ortame. 2024. Towards a Hate Speech Index with Attention-based LSTMs and XLM-RoBERTa. In Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024), pages 106–113, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):: Towards a Hate Speech Index with Attention-based LSTMs and XLM-RoBERTa (Bruno et al., CLiC-it 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.clicit-1.14.pdf

PDF Cite Search Fix data