The Measuring Hate Speech Corpus: Leveraging Rasch Measurement Theory for Data Perspectivism

Pratik Sachdeva; Renata Barreto; Geoff Bacon; Alexander Sahn; Claudia Von Vacano; Chris Kennedy

The Measuring Hate Speech Corpus: Leveraging Rasch Measurement Theory for Data Perspectivism

Pratik Sachdeva, Renata Barreto, Geoff Bacon, Alexander Sahn, Claudia von Vacano, Chris Kennedy

Abstract

We introduce the Measuring Hate Speech corpus, a dataset created to measure hate speech while adjusting for annotators’ perspectives. It consists of 50,070 social media comments spanning YouTube, Reddit, and Twitter, labeled by 11,143 annotators recruited from Amazon Mechanical Turk. Each observation includes 10 ordinal labels: sentiment, disrespect, insult, attacking/defending, humiliation, inferior/superior status, dehumanization, violence, genocide, and a 3-valued hate speech benchmark label. The labels are aggregated using faceted Rasch measurement theory (RMT) into a continuous score that measures each comment’s location on a hate speech spectrum. The annotation experimental design assigned comments to multiple annotators in order to yield a linked network, allowing annotator disagreement (perspective) to be statistically summarized. Annotators’ labeling strictness was estimated during the RMT scaling, projecting their perspective onto a linear measure that was adjusted for the hate speech score. Models that incorporate this annotator perspective parameter as an auxiliary input can generate label- and score-level predictions conditional on annotator perspective. The corpus includes the identity group targets of each comment (8 groups, 42 subgroups) and annotator demographics (6 groups, 40 subgroups), facilitating analyses of interactions between annotator- and comment-level identities, i.e. identity-related annotator perspective.

Anthology ID:: 2022.nlperspectives-1.11
Volume:: Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022
Month:: June
Year:: 2022
Address:: Marseille, France
Editors:: Gavin Abercrombie, Valerio Basile, Sara Tonelli, Verena Rieser, Alexandra Uma
Venue:: NLPerspectives
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 83–94
Language:
URL:: https://aclanthology.org/2022.nlperspectives-1.11/
DOI:
Bibkey:
Cite (ACL):: Pratik Sachdeva, Renata Barreto, Geoff Bacon, Alexander Sahn, Claudia von Vacano, and Chris Kennedy. 2022. The Measuring Hate Speech Corpus: Leveraging Rasch Measurement Theory for Data Perspectivism. In Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022, pages 83–94, Marseille, France. European Language Resources Association.
Cite (Informal):: The Measuring Hate Speech Corpus: Leveraging Rasch Measurement Theory for Data Perspectivism (Sachdeva et al., NLPerspectives 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.nlperspectives-1.11.pdf

PDF Cite Search Fix data