Cross-domain and Cross-lingual Abusive Language Detection: A Hybrid Approach with Deep Learning and a Multilingual Lexicon

Endang Wahyu Pamungkas, Viviana Patti


Abstract
The development of computational methods to detect abusive language in social media within variable and multilingual contexts has recently gained significant traction. The growing interest is confirmed by the large number of benchmark corpora for different languages developed in the latest years. However, abusive language behaviour is multifaceted and available datasets are featured by different topical focuses. This makes abusive language detection a domain-dependent task, and building a robust system to detect general abusive content a first challenge. Moreover, most resources are available for English, which makes detecting abusive language in low-resource languages a further challenge. We address both challenges by considering ten publicly available datasets across different domains and languages. A hybrid approach with deep learning and a multilingual lexicon to cross-domain and cross-lingual detection of abusive content is proposed and compared with other simpler models. We show that training a system on general abusive language datasets will produce a cross-domain robust system, which can be used to detect other more specific types of abusive content. We also found that using the domain-independent lexicon HurtLex is useful to transfer knowledge between domains and languages. In the cross-lingual experiment, we demonstrate the effectiveness of our jointlearning model also in out-domain scenarios.
Anthology ID:
P19-2051
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Month:
July
Year:
2019
Address:
Florence, Italy
Editors:
Fernando Alva-Manchego, Eunsol Choi, Daniel Khashabi
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
363–370
Language:
URL:
https://aclanthology.org/P19-2051
DOI:
10.18653/v1/P19-2051
Bibkey:
Cite (ACL):
Endang Wahyu Pamungkas and Viviana Patti. 2019. Cross-domain and Cross-lingual Abusive Language Detection: A Hybrid Approach with Deep Learning and a Multilingual Lexicon. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 363–370, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Cross-domain and Cross-lingual Abusive Language Detection: A Hybrid Approach with Deep Learning and a Multilingual Lexicon (Pamungkas & Patti, ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/P19-2051.pdf
Data
HatEval