Also published as: Jorge Perez
Resources for Multilingual Hate Speech Detection
Ayme Arango Monnar | Jorge Perez | Barbara Poblete | Magdalena Saldaña | Valentina Proust
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)
Most of the published approaches and resources for hate speech detection are tailored for the English language. In consequence, cross-lingual and cross-cultural perspectives lack some essential resources.The lack of diversity of the datasets in Spanish is notable. Variations throughout Spanish-speaking countries make existing datasets not enough to encompass the task in the different Spanish variants. We annotated 9834 tweets from Chile to enrich the existing Spanish resources with different words and new targets of hate that have not been considered in previous studies.We conducted several cross-dataset evaluation experiments of the models published in the literature using our Chilean dataset and two others in English and Spanish. We propose a comparative framework for quickly conducting comparative experiments using different previously published models.In addition, we set up a Codalab competition for further comparison of new models in a standard scenario, that is, data partitions and evaluation metrics. All resources can be accessed trough a centralized repository for researchers to get a complete picture of the progress on the multilingual hate speech and offensive language detection task.
In this paper we present the dataset of 200,000+ political arguments produced in the local phase of the 2016 Chilean constitutional process. We describe the human processing of this data by the government officials, and the manual tagging of arguments performed by members of our research group. Afterwards we focus on classification tasks that mimic the human processes, comparing linear methods with neural network architectures. The experiments show that some of the manual tasks are suitable for automatization. In particular, the best methods achieve a 90% top-5 accuracy in a multi-class classification of arguments, and 65% macro-averaged F1-score for tagging arguments according to a three-part argumentation model.