Adult Content Detection on Arabic Twitter: Analysis and Experiments

Hamdy Mubarak, Sabit Hassan, Ahmed Abdelali


Abstract
With Twitter being one of the most popular social media platforms in the Arab region, it is not surprising to find accounts that post adult content in Arabic tweets; despite the fact that these platforms dissuade users from such content. In this paper, we present a dataset of Twitter accounts that post adult content. We perform an in-depth analysis of the nature of this data and contrast it with normal tweet content. Additionally, we present extensive experiments with traditional machine learning models, deep neural networks and contextual embeddings to identify such accounts. We show that from user information alone, we can identify such accounts with F1 score of 94.7% (macro average). With the addition of only one tweet as input, the F1 score rises to 96.8%.
Anthology ID:
2021.wanlp-1.14
Volume:
Proceedings of the Sixth Arabic Natural Language Processing Workshop
Month:
April
Year:
2021
Address:
Kyiv, Ukraine (Virtual)
Editors:
Nizar Habash, Houda Bouamor, Hazem Hajj, Walid Magdy, Wajdi Zaghouani, Fethi Bougares, Nadi Tomeh, Ibrahim Abu Farha, Samia Touileb
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
136–144
Language:
URL:
https://aclanthology.org/2021.wanlp-1.14
DOI:
Bibkey:
Cite (ACL):
Hamdy Mubarak, Sabit Hassan, and Ahmed Abdelali. 2021. Adult Content Detection on Arabic Twitter: Analysis and Experiments. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 136–144, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.
Cite (Informal):
Adult Content Detection on Arabic Twitter: Analysis and Experiments (Mubarak et al., WANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.wanlp-1.14.pdf