Generalizable Implicit Hate Speech Detection Using Contrastive Learning

Youngwook Kim, Shinwoo Park, Yo-Sub Han


Abstract
Hate speech detection has gained increasing attention with the growing prevalence of hateful contents. When a text contains an obvious hate word or expression, it is fairly easy to detect it. However, it is challenging to identify implicit hate speech in nuance or context when there are insufficient lexical cues. Recently, there are several attempts to detect implicit hate speech leveraging pre-trained language models such as BERT and HateBERT. Fine-tuning on an implicit hate speech dataset shows satisfactory performance when evaluated on the test set of the dataset used for training. However, we empirically confirm that the performance drops at least 12.5%p in F1 score when tested on the dataset that is different from the one used for training. We tackle this cross-dataset underperforming problem using contrastive learning. Based on our observation of common underlying implications in various forms of hate posts, we propose a novel contrastive learning method, ImpCon, that pulls an implication and its corresponding posts close in representation space. We evaluate the effectiveness of ImpCon by running cross-dataset evaluation on three implicit hate speech benchmarks. The experimental results on cross-dataset show that ImpCon improves at most 9.10% on BERT, and 8.71% on HateBERT.
Anthology ID:
2022.coling-1.579
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
6667–6679
Language:
URL:
https://aclanthology.org/2022.coling-1.579
DOI:
Bibkey:
Cite (ACL):
Youngwook Kim, Shinwoo Park, and Yo-Sub Han. 2022. Generalizable Implicit Hate Speech Detection Using Contrastive Learning. In Proceedings of the 29th International Conference on Computational Linguistics, pages 6667–6679, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Generalizable Implicit Hate Speech Detection Using Contrastive Learning (Kim et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.579.pdf
Code
 youngwook06/impcon
Data
Implicit HateSBIC