A Modular Taxonomy for Hate Speech Definitions and Its Impact on Zero-Shot LLM Classification Performance

Matteo Melis; Gabriella Lapesa; Dennis Assenmacher

A Modular Taxonomy for Hate Speech Definitions and Its Impact on Zero-Shot LLM Classification Performance

Matteo Melis, Gabriella Lapesa, Dennis Assenmacher

Abstract

Detecting harmful content is a crucial task in the landscape of NLP applications for Social Good, with hate speech being one of its most dangerous forms. But what do we mean by hate speech, how can we define it and how does prompting different definitions of hate speech affect model performance? The contribution of this work is twofold. At the theoretical level, we address the ambiguity surrounding hate speech by collecting and analyzing existing definitions from the literature. We organize these definitions into a taxonomy of 14 conceptual elements—building blocks that capture different aspects of hate speech definitions, such as references to the target of hate. At the experimental level, we employ the collection of definitions in a systematic zero-shot evaluation of three LLMs, on three hate speech datasets representing different types of data (synthetic, human-in-the-loop, and real-world). We find that choosing different definitions, i.e., definitions with a different degree of specificity in terms of encoded elements, impacts model performance, but this effect is not consistent across all architectures.

Anthology ID:: 2025.woah-1.45
Volume:: Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)
Month:: August
Year:: 2025
Address:: Vienna, Austria
Editors:: Agostina Calabrese, Christine de Kock, Debora Nozza, Flor Miriam Plaza-del-Arco, Zeerak Talat, Francielle Vargas
Venues:: WOAH | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 490–521
Language:
URL:: https://aclanthology.org/2025.woah-1.45/
DOI:
Bibkey:
Cite (ACL):: Matteo Melis, Gabriella Lapesa, and Dennis Assenmacher. 2025. A Modular Taxonomy for Hate Speech Definitions and Its Impact on Zero-Shot LLM Classification Performance. In Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH), pages 490–521, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: A Modular Taxonomy for Hate Speech Definitions and Its Impact on Zero-Shot LLM Classification Performance (Melis et al., WOAH 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.woah-1.45.pdf

PDF Cite Search Fix data