“Zo Grof !”: A Comprehensive Corpus for Offensive and Abusive Language in Dutch

Ward Ruitenbeek, Victor Zwart, Robin Van Der Noord, Zhenja Gnezdilov, Tommaso Caselli


Abstract
This paper presents a comprehensive corpus for the study of socially unacceptable language in Dutch. The corpus extends and revise an existing resource with more data and introduces a new annotation dimension for offensive language, making it a unique resource in the Dutch language panorama. Each language phenomenon (abusive and offensive language) in the corpus has been annotated with a multi-layer annotation scheme modelling the explicitness and the target(s) of the message. We have conducted a new set of experiments with different classification algorithms on all annotation dimensions. Monolingual Pre-Trained Language Models prove as the best systems, obtaining a macro-average F1 of 0.828 for binary classification of offensive language, and 0.579 for the targets of offensive messages. Furthermore, the best system obtains a macro-average F1 of 0.667 for distinguishing between abusive and offensive messages.
Anthology ID:
2022.woah-1.5
Volume:
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)
Month:
July
Year:
2022
Address:
Seattle, Washington (Hybrid)
Editors:
Kanika Narang, Aida Mostafazadeh Davani, Lambert Mathias, Bertie Vidgen, Zeerak Talat
Venue:
WOAH
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
40–56
Language:
URL:
https://aclanthology.org/2022.woah-1.5
DOI:
10.18653/v1/2022.woah-1.5
Bibkey:
Cite (ACL):
Ward Ruitenbeek, Victor Zwart, Robin Van Der Noord, Zhenja Gnezdilov, and Tommaso Caselli. 2022. “Zo Grof !”: A Comprehensive Corpus for Offensive and Abusive Language in Dutch. In Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), pages 40–56, Seattle, Washington (Hybrid). Association for Computational Linguistics.
Cite (Informal):
“Zo Grof !”: A Comprehensive Corpus for Offensive and Abusive Language in Dutch (Ruitenbeek et al., WOAH 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.woah-1.5.pdf
Video:
 https://aclanthology.org/2022.woah-1.5.mp4
Code
 tommasoc80/dalc