A Hybrid Confidence-Aware Framework for Arabic Toxicity Detection in Social Media

Fawzia Zaal Alanazi; Asma Mohammed Alamri; Arwa Bin Saleh; Abdullah I. Alharbi

A Hybrid Confidence-Aware Framework for Arabic Toxicity Detection in Social Media

Fawzia Zaal Alanazi, Asma Mohammed Alamri, Arwa Bin Saleh, Abdullah I. Alharbi

Abstract

Automatic detection of toxic and offensive content in Arabic social media is a challenging task due to rich morphology, dialectal variation, and noisy writing styles. While transformer-based language models have achieved strong performance, they often produce uncertain predictions in borderline cases. This paper presents a hybrid framework for Arabic toxicity detection that combines a pretrained Arabic-specific transformer model with a confidence-aware rule-based mechanism. The proposed approach activates automatically induced lexical rules only when the model prediction falls within a predefined gray zone of uncertainty, preserving neural dominance while improving robustness and interpretability. Experiments conducted on a manually annotated dataset of 35,000 Arabic posts demonstrate that the hybrid approach achieves consistent improvements over the baseline model, particularly in reducing false negatives for toxic content. The results indicate that selective rule activation is an effective strategy for enhancing reliability in real-world Arabic social media moderation systems.

Anthology ID:: 2026.abjadnlp-1.42
Volume:: Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Venues:: AbjadNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 364–370
Language:
URL:: https://aclanthology.org/2026.abjadnlp-1.42/
DOI:
Bibkey:
Cite (ACL):: Fawzia Zaal Alanazi, Asma Mohammed Alamri, Arwa Bin Saleh, and Abdullah I. Alharbi. 2026. A Hybrid Confidence-Aware Framework for Arabic Toxicity Detection in Social Media. In Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script, pages 364–370, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: A Hybrid Confidence-Aware Framework for Arabic Toxicity Detection in Social Media (Alanazi et al., AbjadNLP 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.abjadnlp-1.42.pdf

PDF Cite Search Fix data