Multi-domain Hate Speech Detection Using Dual Contrastive Learning and Paralinguistic Features

Somaiyeh Dehghan; Berrin Yanıkoğlu

Multi-domain Hate Speech Detection Using Dual Contrastive Learning and Paralinguistic Features

Abstract

Social networks have become venues where people can share and spread hate speech, especially when the platforms allow users to remain anonymous. Hate speech can have significant social and cultural effects, especially when it targets specific groups of people in terms of religion, race, ethnicity, culture or a specific social situation such as immigrants and refugees. In this study, we propose a hate speech detection model, BERTurk-DualCL, using a mixed objective with contrastive learning loss that is combined with the traditional cross-entropy loss used for classification. In addition, we study the effects of paralinguistic features, namely emojis and hashtags, on the performance of our model. We trained and evaluated our model on tweets in four different topics with heated discussions from two separate datasets, ranging from discussions about migrants to the Israel-Palestine conflict. Our multi-domain model outperforms comparable results in literature and the average results of four domain-specific models, achieving a macro-F1 score of 81.04% and 58.89% on two- and five-class tasks respectively.

Anthology ID:: 2024.lrec-main.1025
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 11745–11755
Language:
URL:: https://aclanthology.org/2024.lrec-main.1025/
DOI:
Bibkey:
Cite (ACL):: Somaiyeh Dehghan and Berrin Yanıkoğlu. 2024. Multi-domain Hate Speech Detection Using Dual Contrastive Learning and Paralinguistic Features. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 11745–11755, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Multi-domain Hate Speech Detection Using Dual Contrastive Learning and Paralinguistic Features (Dehghan & Yanıkoğlu, LREC-COLING 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.lrec-main.1025.pdf

PDF Cite Search Fix data