An Expert Annotated Dataset for the Detection of Online Misogyny

Ella Guest, Bertie Vidgen, Alexandros Mittos, Nishanth Sastry, Gareth Tyson, Helen Margetts


Abstract
Online misogyny is a pernicious social problem that risks making online platforms toxic and unwelcoming to women. We present a new hierarchical taxonomy for online misogyny, as well as an expert labelled dataset to enable automatic classification of misogynistic content. The dataset consists of 6567 labels for Reddit posts and comments. As previous research has found untrained crowdsourced annotators struggle with identifying misogyny, we hired and trained annotators and provided them with robust annotation guidelines. We report baseline classification performance on the binary classification task, achieving accuracy of 0.93 and F1 of 0.43. The codebook and datasets are made freely available for future researchers.
Anthology ID:
2021.eacl-main.114
Volume:
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Month:
April
Year:
2021
Address:
Online
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1336–1350
Language:
URL:
https://aclanthology.org/2021.eacl-main.114
DOI:
10.18653/v1/2021.eacl-main.114
Bibkey:
Cite (ACL):
Ella Guest, Bertie Vidgen, Alexandros Mittos, Nishanth Sastry, Gareth Tyson, and Helen Margetts. 2021. An Expert Annotated Dataset for the Detection of Online Misogyny. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 1336–1350, Online. Association for Computational Linguistics.
Cite (Informal):
An Expert Annotated Dataset for the Detection of Online Misogyny (Guest et al., EACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.eacl-main.114.pdf
Dataset:
 2021.eacl-main.114.Dataset.zip
Code
 ellamguest/online-misogyny-eacl2021