The Vulnerable Identities Recognition Corpus (VIRC) for Hate Speech Analysis

Ibai Guillén-Pacho; Arianna Longo; Marco Antonio Stranisci; Viviana Patti; Carlos Badenes-Olmedo

The Vulnerable Identities Recognition Corpus (VIRC) for Hate Speech Analysis

Ibai Guillén-Pacho, Arianna Longo, Marco Antonio Stranisci, Viviana Patti, Carlos Badenes-Olmedo

Abstract

This paper presents the Vulnerable Identities Recognition Corpus (VIRC), a novel resource designed to enhance hate speech analysis in Italian and Spanish news headlines. VIRC comprises 921 headlines, manually annotated for vulnerable identities, dangerous discourse, derogatory expressions, and entities. Our experiments reveal that large language models (LLMs) struggle significantly with the fine-grained identification of these elements, underscoring the complexity of detecting hate speech. VIRC stands out as the first resource of its kind in these languages, offering a richer annotation schema compared to existing corpora. The insights derived from VIRC can inform the development of sophisticated detection tools and the creation of policies and regulations to combat hate speech on social media, promoting a safer online environment. Future work will focus on expanding the corpus and refining annotation guidelines to further enhance its comprehensiveness and reliability.

Anthology ID:: 2024.clicit-1.50
Volume:: Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:: December
Year:: 2024
Address:: Pisa, Italy
Editors:: Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:: CLiC-it
SIG:
Publisher:: CEUR Workshop Proceedings
Note:
Pages:: 417–424
Language:
URL:: https://aclanthology.org/2024.clicit-1.50/
DOI:
Bibkey:
Cite (ACL):: Ibai Guillén-Pacho, Arianna Longo, Marco Antonio Stranisci, Viviana Patti, and Carlos Badenes-Olmedo. 2024. The Vulnerable Identities Recognition Corpus (VIRC) for Hate Speech Analysis. In Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024), pages 417–424, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):: The Vulnerable Identities Recognition Corpus (VIRC) for Hate Speech Analysis (Guillén-Pacho et al., CLiC-it 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.clicit-1.50.pdf

PDF Cite Search Fix data