The Annotators Agree to Not Agree on the Fine-grained Annotation of Hate-speech against Women in Algerian Dialect Comments

Imane Guellil, Yousra Houichi, Sara Chennoufi, Mohamed Boubred, Anfal Yousra Boucetta, Faical Azouaou


Abstract
A significant number of research studies have been presented for detecting hate speech in social media during the last few years. However, the majority of these studies are in English. Only a few studies focus on Arabic and its dialects (especially the Algerian dialect) with a smaller number of them targeting sexism detection (or hate speech against women). Even the works that have been proposed on Arabic sexism detection consider two classes only (hateful and non-hateful), and three classes(adding the neutral class) in the best scenario. This paper aims to propose the first fine-grained corpus focusing on 13 classes. However, given the challenges related to hate speech and fine-grained annotation, the Kappa metric is relatively low among the annotators (i.e. 35% ). This work in progress proposes three main contributions: 1) Annotation of different categories related to hate speech such as insults, vulgar words or hate in general. 2) Annotation of 10,000 comments, in Arabic and Algerian dialects, automatically extracted from Youtube. 3) High-lighting the challenges related to manual annotation such as subjectivity, risk of bias, lack of annotation guidelines, etc
Anthology ID:
2024.rail-1.15
Volume:
Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Rooweither Mabuya, Muzi Matfunjwa, Mmasibidi Setaka, Menno van Zaanen
Venues:
RAIL | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
133–139
Language:
URL:
https://aclanthology.org/2024.rail-1.15
DOI:
Bibkey:
Cite (ACL):
Imane Guellil, Yousra Houichi, Sara Chennoufi, Mohamed Boubred, Anfal Yousra Boucetta, and Faical Azouaou. 2024. The Annotators Agree to Not Agree on the Fine-grained Annotation of Hate-speech against Women in Algerian Dialect Comments. In Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024, pages 133–139, Torino, Italia. ELRA and ICCL.
Cite (Informal):
The Annotators Agree to Not Agree on the Fine-grained Annotation of Hate-speech against Women in Algerian Dialect Comments (Guellil et al., RAIL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.rail-1.15.pdf