Zero-shot Cross-lingual Content Filtering: Offensive Language and Hate Speech Detection

Andraž Pelicon, Ravi Shekhar, Matej Martinc, Blaž Škrlj, Matthew Purver, Senja Pollak


Abstract
We present a system for zero-shot cross-lingual offensive language and hate speech classification. The system was trained on English datasets and tested on a task of detecting hate speech and offensive social media content in a number of languages without any additional training. Experiments show an impressive ability of both models to generalize from English to other languages. There is however an expected gap in performance between the tested cross-lingual models and the monolingual models. The best performing model (offensive content classifier) is available online as a REST API.
Anthology ID:
2021.hackashop-1.5
Volume:
Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation
Month:
April
Year:
2021
Address:
Online
Editors:
Hannu Toivonen, Michele Boggia
Venue:
Hackashop
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
30–34
Language:
URL:
https://aclanthology.org/2021.hackashop-1.5
DOI:
Bibkey:
Cite (ACL):
Andraž Pelicon, Ravi Shekhar, Matej Martinc, Blaž Škrlj, Matthew Purver, and Senja Pollak. 2021. Zero-shot Cross-lingual Content Filtering: Offensive Language and Hate Speech Detection. In Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation, pages 30–34, Online. Association for Computational Linguistics.
Cite (Informal):
Zero-shot Cross-lingual Content Filtering: Offensive Language and Hate Speech Detection (Pelicon et al., Hackashop 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.hackashop-1.5.pdf
Data
HatEvalOLIDXhate999