Investigating Annotator Bias in Abusive Language Datasets

Maximilian Wich, Christian Widmer, Gerhard Hagerer, Georg Groh


Abstract
Nowadays, social media platforms use classification models to cope with hate speech and abusive language. The problem of these models is their vulnerability to bias. A prevalent form of bias in hate speech and abusive language datasets is annotator bias caused by the annotator’s subjective perception and the complexity of the annotation task. In our paper, we develop a set of methods to measure annotator bias in abusive language datasets and to identify different perspectives on abusive language. We apply these methods to four different abusive language datasets. Our proposed approach supports annotation processes of such datasets and future research addressing different perspectives on the perception of abusive language.
Anthology ID:
2021.ranlp-1.170
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:
September
Year:
2021
Address:
Held Online
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
1515–1525
Language:
URL:
https://aclanthology.org/2021.ranlp-1.170
DOI:
Bibkey:
Cite (ACL):
Maximilian Wich, Christian Widmer, Gerhard Hagerer, and Georg Groh. 2021. Investigating Annotator Bias in Abusive Language Datasets. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 1515–1525, Held Online. INCOMA Ltd..
Cite (Informal):
Investigating Annotator Bias in Abusive Language Datasets (Wich et al., RANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.ranlp-1.170.pdf
Code
 mawic/annotator-bias-abusive-language