A Weakly Supervised Classifier and Dataset of White Supremacist Language

Michael Yoder; Ahmad Diab; David Brown; Kathleen M. Carley

doi:10.18653/v1/2023.acl-short.17

A Weakly Supervised Classifier and Dataset of White Supremacist Language

Michael Yoder, Ahmad Diab, David Brown, Kathleen Carley

Abstract

We present a dataset and classifier for detecting the language of white supremacist extremism, a growing issue in online hate speech. Our weakly supervised classifier is trained on large datasets of text from explicitly white supremacist domains paired with neutral and anti-racist data from similar domains. We demonstrate that this approach improves generalization performance to new domains. Incorporating anti-racist texts as counterexamples to white supremacist language mitigates bias.

Anthology ID:: 2023.acl-short.17
Original:: 2023.acl-short.17v1
Version 2:: 2023.acl-short.17v2
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 172–185
Language:
URL:: https://aclanthology.org/2023.acl-short.17
DOI:: 10.18653/v1/2023.acl-short.17
Bibkey:
Cite (ACL):: Michael Yoder, Ahmad Diab, David Brown, and Kathleen Carley. 2023. A Weakly Supervised Classifier and Dataset of White Supremacist Language. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 172–185, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: A Weakly Supervised Classifier and Dataset of White Supremacist Language (Yoder et al., ACL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.acl-short.17.pdf
Video:: https://aclanthology.org/2023.acl-short.17.mp4

PDF (v2) PDF (v1) Cite Search Video