Cross-Lingual Transfer Learning for Hate Speech Detection

Irina Bigoulaeva, Viktor Hangya, Alexander Fraser


Abstract
We address the task of automatic hate speech detection for low-resource languages. Rather than collecting and annotating new hate speech data, we show how to use cross-lingual transfer learning to leverage already existing data from higher-resource languages. Using bilingual word embeddings based classifiers we achieve good performance on the target language by training only on the source dataset. Using our transferred system we bootstrap on unlabeled target language data, improving the performance of standard cross-lingual transfer approaches. We use English as a high resource language and German as the target language for which only a small amount of annotated corpora are available. Our results indicate that cross-lingual transfer learning together with our approach to leverage additional unlabeled data is an effective way of achieving good performance on low-resource target languages without the need for any target-language annotations.
Anthology ID:
2021.ltedi-1.3
Volume:
Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion
Month:
April
Year:
2021
Address:
Kyiv
Editors:
Bharathi Raja Chakravarthi, John P. McCrae, Manel Zarrouk, Kalika Bali, Paul Buitelaar
Venue:
LTEDI
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15–25
Language:
URL:
https://aclanthology.org/2021.ltedi-1.3
DOI:
Bibkey:
Cite (ACL):
Irina Bigoulaeva, Viktor Hangya, and Alexander Fraser. 2021. Cross-Lingual Transfer Learning for Hate Speech Detection. In Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion, pages 15–25, Kyiv. Association for Computational Linguistics.
Cite (Informal):
Cross-Lingual Transfer Learning for Hate Speech Detection (Bigoulaeva et al., LTEDI 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.ltedi-1.3.pdf
Dataset:
 2021.ltedi-1.3.Dataset.txt
Data
Hate Speech