ConPrompt: Pre-training a Language Model with Machine-Generated Data for Implicit Hate Speech Detection

Youngwook Kim; Shinwoo Park; Youngsoo Namgoong; Yo-Sub Han

doi:10.18653/v1/2023.findings-emnlp.731

ConPrompt: Pre-training a Language Model with Machine-Generated Data for Implicit Hate Speech Detection

Youngwook Kim, Shinwoo Park, Youngsoo Namgoong, Yo-Sub Han

Abstract

Implicit hate speech detection is a challenging task in text classification since no explicit cues (e.g., swear words) exist in the text. While some pre-trained language models have been developed for hate speech detection, they are not specialized in implicit hate speech. Recently, an implicit hate speech dataset with a massive number of samples has been proposed by controlling machine generation. We propose a pre-training approach, ConPrompt, to fully leverage such machine-generated data. Specifically, given a machine-generated statement, we use example statements of its origin prompt as positive samples for contrastive learning. Through pre-training with ConPrompt, we present ToxiGen-ConPrompt, a pre-trained language model for implicit hate speech detection. We conduct extensive experiments on several implicit hate speech datasets and show the superior generalization ability of ToxiGen-ConPrompt compared to other pre-trained models. Additionally, we empirically show that ConPrompt is effective in mitigating identity term bias, demonstrating that it not only makes a model more generalizable but also reduces unintended bias. We analyze the representation quality of ToxiGen-ConPrompt and show its ability to consider target group and toxicity, which are desirable features in terms of implicit hate speeches.

Anthology ID:: 2023.findings-emnlp.731
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2023
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10964–10980
Language:
URL:: https://aclanthology.org/2023.findings-emnlp.731/
DOI:: 10.18653/v1/2023.findings-emnlp.731
Bibkey:
Cite (ACL):: Youngwook Kim, Shinwoo Park, Youngsoo Namgoong, and Yo-Sub Han. 2023. ConPrompt: Pre-training a Language Model with Machine-Generated Data for Implicit Hate Speech Detection. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 10964–10980, Singapore. Association for Computational Linguistics.
Cite (Informal):: ConPrompt: Pre-training a Language Model with Machine-Generated Data for Implicit Hate Speech Detection (Kim et al., Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-emnlp.731.pdf

PDF Cite Search Fix data