Offensive Language Detection in Arabizi

Imene Bensalem, Meryem Mout, Paolo Rosso


Abstract
Detecting offensive language in under-resourced languages presents a significant real-world challenge for social media platforms. This paper is the first work focused on the issue of offensive language detection in Arabizi, an under-explored topic in an under-resourced form of Arabic. For the first time, a comprehensive and critical overview of the existing work on the topic is presented. In addition, we carry out experiments using different BERT-like models and show the feasibility of detecting offensive language in Arabizi with high accuracy. Throughout a thorough analysis of results, we emphasize the complexities introduced by dialect variations and out-of-domain generalization. We use in our experiments a dataset that we have constructed by leveraging existing, albeit limited, resources. To facilitate further research, we make this dataset publicly accessible to the research community.
Anthology ID:
2023.arabicnlp-1.36
Volume:
Proceedings of ArabicNLP 2023
Month:
December
Year:
2023
Address:
Singapore (Hybrid)
Editors:
Hassan Sawaf, Samhaa El-Beltagy, Wajdi Zaghouani, Walid Magdy, Ahmed Abdelali, Nadi Tomeh, Ibrahim Abu Farha, Nizar Habash, Salam Khalifa, Amr Keleg, Hatem Haddad, Imed Zitouni, Khalil Mrini, Rawan Almatham
Venues:
ArabicNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
423–434
Language:
URL:
https://aclanthology.org/2023.arabicnlp-1.36
DOI:
10.18653/v1/2023.arabicnlp-1.36
Bibkey:
Cite (ACL):
Imene Bensalem, Meryem Mout, and Paolo Rosso. 2023. Offensive Language Detection in Arabizi. In Proceedings of ArabicNLP 2023, pages 423–434, Singapore (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Offensive Language Detection in Arabizi (Bensalem et al., ArabicNLP-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.arabicnlp-1.36.pdf
Video:
 https://aclanthology.org/2023.arabicnlp-1.36.mp4