Let-Mi: An Arabic Levantine Twitter Dataset for Misogynistic Language

Hala Mulki, Bilal Ghanem


Abstract
Online misogyny has become an increasing worry for Arab women who experience gender-based online abuse on a daily basis. Misogyny automatic detection systems can assist in the prohibition of anti-women Arabic toxic content. Developing such systems is hindered by the lack of the Arabic misogyny benchmark datasets. In this paper, we introduce an Arabic Levantine Twitter dataset for Misogynistic language (LeT-Mi) to be the first benchmark dataset for Arabic misogyny. We further provide a detailed review of the dataset creation and annotation phases. The consistency of the annotations for the proposed dataset was emphasized through inter-rater agreement evaluation measures. Moreover, Let-Mi was used as an evaluation dataset through binary/multi-/target classification tasks conducted by several state-of-the-art machine learning systems along with Multi-Task Learning (MTL) configuration. The obtained results indicated that the performances achieved by the used systems are consistent with state-of-the-art results for languages other than Arabic, while employing MTL improved the performance of the misogyny/target classification tasks.
Anthology ID:
2021.wanlp-1.16
Volume:
Proceedings of the Sixth Arabic Natural Language Processing Workshop
Month:
April
Year:
2021
Address:
Kyiv, Ukraine (Virtual)
Editors:
Nizar Habash, Houda Bouamor, Hazem Hajj, Walid Magdy, Wajdi Zaghouani, Fethi Bougares, Nadi Tomeh, Ibrahim Abu Farha, Samia Touileb
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
154–163
Language:
URL:
https://aclanthology.org/2021.wanlp-1.16
DOI:
Bibkey:
Cite (ACL):
Hala Mulki and Bilal Ghanem. 2021. Let-Mi: An Arabic Levantine Twitter Dataset for Misogynistic Language. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 154–163, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.
Cite (Informal):
Let-Mi: An Arabic Levantine Twitter Dataset for Misogynistic Language (Mulki & Ghanem, WANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.wanlp-1.16.pdf
Code
 bilalghanem/let-mi
Data
LeT-Mi