Using Transfer-based Language Models to Detect Hateful and Offensive Language Online

Vebjørn Isaksen, Björn Gambäck


Abstract
Distinguishing hate speech from non-hate offensive language is challenging, as hate speech not always includes offensive slurs and offensive language not always express hate. Here, four deep learners based on the Bidirectional Encoder Representations from Transformers (BERT), with either general or domain-specific language models, were tested against two datasets containing tweets labelled as either ‘Hateful’, ‘Normal’ or ‘Offensive’. The results indicate that the attention-based models profoundly confuse hate speech with offensive and normal language. However, the pre-trained models outperform state-of-the-art results in terms of accurately predicting the hateful instances.
Anthology ID:
2020.alw-1.3
Volume:
Proceedings of the Fourth Workshop on Online Abuse and Harms
Month:
November
Year:
2020
Address:
Online
Editors:
Seyi Akiwowo, Bertie Vidgen, Vinodkumar Prabhakaran, Zeerak Waseem
Venue:
ALW
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16–27
Language:
URL:
https://aclanthology.org/2020.alw-1.3
DOI:
10.18653/v1/2020.alw-1.3
Bibkey:
Cite (ACL):
Vebjørn Isaksen and Björn Gambäck. 2020. Using Transfer-based Language Models to Detect Hateful and Offensive Language Online. In Proceedings of the Fourth Workshop on Online Abuse and Harms, pages 16–27, Online. Association for Computational Linguistics.
Cite (Informal):
Using Transfer-based Language Models to Detect Hateful and Offensive Language Online (Isaksen & Gambäck, ALW 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.alw-1.3.pdf
Video:
 https://slideslive.com/38939536
Data
BookCorpusHate Speech