Explain the Flag: Contextualizing Hate Speech Beyond Censorship

Jason Liartis; Eirini Kaldeli; Lamprini Gyftokosta; Eleftherios Chelioudakis; Orfeas Menis Mastromichalakis

Explain the Flag: Contextualizing Hate Speech Beyond Censorship

Jason Liartis, Eirini Kaldeli, Lamprini Gyftokosta, Eleftherios Chelioudakis, Orfeas Menis Mastromichalakis

Abstract

Hate, derogatory, and offensive speech remains a persistent challenge in online platforms and public discourse. While automated detection systems are widely used, most focus on censorship or removal, raising concerns for transparency and freedom of expression, and limiting opportunities to explain why content is harmful. To address these issues, explanatory approaches have emerged as a promising solution, aiming to make hate speech detection more transparent, accountable, and informative. In this paper, we present a hybrid approach that combines Large Language Models (LLMs) with three newly created and curated vocabularies to detect and explain hate speech in English, French, and Greek. Our system captures both inherently derogatory expressions tied to identity characteristics and direct group-targeted content through two complementary pipelines: one that detects and disambiguates problematic terms using the curated vocabularies, and one that leverages LLMs as context-aware evaluators of group-targeting content. The outputs are fused into grounded explanations that clarify why content is flagged. Human evaluation shows that our hybrid approach is accurate, with high-quality explanations, outperforming LLM-only baselines.

Anthology ID:: 2026.findings-acl.406
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8305–8325
Language:
URL:: https://aclanthology.org/2026.findings-acl.406/
DOI:
Bibkey:
Cite (ACL):: Jason Liartis, Eirini Kaldeli, Lamprini Gyftokosta, Eleftherios Chelioudakis, and Orfeas Menis Mastromichalakis. 2026. Explain the Flag: Contextualizing Hate Speech Beyond Censorship. In Findings of the Association for Computational Linguistics: ACL 2026, pages 8305–8325, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Explain the Flag: Contextualizing Hate Speech Beyond Censorship (Liartis et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.406.pdf
Checklist:: 2026.findings-acl.406.checklist.pdf

PDF Cite Search Checklist Fix data