Bias Discovery within Human Raters: A Case Study of the Jigsaw Dataset

Marta Marchiori Manerba, Riccardo Guidotti, Lucia Passaro, Salvatore Ruggieri


Abstract
Understanding and quantifying the bias introduced by human annotation of data is a crucial problem for trustworthy supervised learning. Recently, a perspectivist trend has emerged in the NLP community, focusing on the inadequacy of previous aggregation schemes, which suppose the existence of single ground truth. This assumption is particularly problematic for sensitive tasks involving subjective human judgments, such as toxicity detection. To address these issues, we propose a preliminary approach for bias discovery within human raters by exploring individual ratings for specific sensitive topics annotated in the texts. Our analysis’s object consists of the Jigsaw dataset, a collection of comments aiming at challenging online toxicity identification.
Anthology ID:
2022.nlperspectives-1.4
Volume:
Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Gavin Abercrombie, Valerio Basile, Sara Tonelli, Verena Rieser, Alexandra Uma
Venue:
NLPerspectives
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
26–31
Language:
URL:
https://aclanthology.org/2022.nlperspectives-1.4
DOI:
Bibkey:
Cite (ACL):
Marta Marchiori Manerba, Riccardo Guidotti, Lucia Passaro, and Salvatore Ruggieri. 2022. Bias Discovery within Human Raters: A Case Study of the Jigsaw Dataset. In Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022, pages 26–31, Marseille, France. European Language Resources Association.
Cite (Informal):
Bias Discovery within Human Raters: A Case Study of the Jigsaw Dataset (Marchiori Manerba et al., NLPerspectives 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.nlperspectives-1.4.pdf
Code
 martamarchiori/bias-discovery-in-human-raters