Identification of Multiword Expressions in Tweets for Hate Speech Detection

Nicolas Zampieri, Carlos Ramisch, Irina Illina, Dominique Fohr


Abstract
Multiword expression (MWE) identification in tweets is a complex task due to the complex linguistic nature of MWEs combined with the non-standard language use in social networks. MWE features were shown to be helpful for hate speech detection (HSD). In this article, we present joint experiments on these two related tasks on English Twitter data: first we focus on the MWE identification task, and then we observe the influence of MWE-based features on the HSD task. For MWE identification, we compare the performance of two systems: lexicon-based and deep neural networks-based (DNN). We experimentally evaluate seven configurations of a state-of-the-art DNN system based on recurrent networks using pre-trained contextual embeddings from BERT. The DNN-based system outperforms the lexicon-based one thanks to its superior generalisation power, yielding much better recall. For the HSD task, we propose a new DNN architecture for incorporating MWE features. We confirm that MWE features are helpful for the HSD task. Moreover, the proposed DNN architecture beats previous MWE-based HSD systems by 0.4 to 1.1 F-measure points on average on four Twitter HSD corpora.
Anthology ID:
2022.lrec-1.22
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
202–210
Language:
URL:
https://aclanthology.org/2022.lrec-1.22
DOI:
Bibkey:
Cite (ACL):
Nicolas Zampieri, Carlos Ramisch, Irina Illina, and Dominique Fohr. 2022. Identification of Multiword Expressions in Tweets for Hate Speech Detection. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 202–210, Marseille, France. European Language Resources Association.
Cite (Informal):
Identification of Multiword Expressions in Tweets for Hate Speech Detection (Zampieri et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.22.pdf
Data
HatEvalSTREUSLE