Data-Efficient Methods For Improving Hate Speech Detection

Sumegh Roychowdhury, Vikram Gupta


Abstract
Scarcity of large-scale datasets, especially for resource-impoverished languages motivates exploration of data-efficient methods for hate speech detection. Hateful intents are expressed explicitly (use of cuss, swear, abusive words) and implicitly (indirect and contextual). In this work, we progress implicit and explicit hate speech detection using an input-level data augmentation technique, task reformulation using entailment and cross-learning across five languages. Our proposed data augmentation technique EasyMix, improves the performance across all english datasets by ~1% and across multilingual datasets by ~1-9%. We also observe substantial gains of ~2-8% by reformulating hate speech detection as entail problem. We further probe the contextual models and observe that higher layers encode implicit hate while lower layers focus on explicit hate, highlighting the importance of token-level understanding for explicit and context-level for implicit hate speech detection. Code and Dataset splits - https://anonymous.4open.science/r/data_efficient_hatedetect/
Anthology ID:
2023.findings-eacl.9
Volume:
Findings of the Association for Computational Linguistics: EACL 2023
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
125–132
Language:
URL:
https://aclanthology.org/2023.findings-eacl.9
DOI:
10.18653/v1/2023.findings-eacl.9
Bibkey:
Cite (ACL):
Sumegh Roychowdhury and Vikram Gupta. 2023. Data-Efficient Methods For Improving Hate Speech Detection. In Findings of the Association for Computational Linguistics: EACL 2023, pages 125–132, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Data-Efficient Methods For Improving Hate Speech Detection (Roychowdhury & Gupta, Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-eacl.9.pdf
Dataset:
 2023.findings-eacl.9.dataset.zip
Video:
 https://aclanthology.org/2023.findings-eacl.9.mp4