Gretel Liz De la Peña Sarracén
2022
Unsupervised Embeddings with Graph Auto-Encoders for Multi-domain and Multilingual Hate Speech Detection
Gretel Liz De la Peña Sarracén
|
Paolo Rosso
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Hate speech detection is a prominent and challenging task, since hate messages are often expressed in subtle ways and with characteristics that may vary depending on the author. Hence, many models suffer from the generalization problem. However, retrieving and monitoring hateful content on social media is a current necessity. In this paper, we propose an unsupervised approach using Graph Auto-Encoders (GAE), which allows us to avoid using labeled data when training the representation of the texts. Specifically, we represent texts as nodes of a graph, and use a transformer layer together with a convolutional layer to encode these nodes in a low-dimensional space. As a result, we obtain embeddings that can be decoded into a reconstruction of the original network. Our main idea is to learn a model with a set of texts without supervision, in order to generate embeddings for the nodes: nodes with the same label should be close in the embedding space, which, in turn, should allow us to distinguish among classes. We employ this strategy to detect hate speech in multi-domain and multilingual sets of texts, where our method shows competitive results on small datasets.
2020
PRHLT-UPV at SemEval-2020 Task 8: Study of Multimodal Techniques for Memes Analysis
Gretel Liz De la Peña Sarracén
|
Paolo Rosso
|
Anastasia Giachanou
Proceedings of the Fourteenth Workshop on Semantic Evaluation
This paper describes the system submitted by the PRHLT-UPV team for the task 8 of SemEval-2020: Memotion Analysis. We propose a multimodal model that combines pretrained models of the BERT and VGG architectures. The BERT model is used to process the textual information and VGG the images. The multimodal model is used to classify memes according to the presence of offensive, sarcastic, humorous and motivating content. Also, a sentiment analysis of memes is carried out with the proposed model. In the experiments, the model is compared with other approaches to analyze the relevance of the multimodal model. The results show encouraging performances on the final leaderboard of the competition, reaching good positions in the ranking of systems.
PRHLT-UPV at SemEval-2020 Task 12: BERT for Multilingual Offensive Language Detection
Gretel Liz De la Peña Sarracén
|
Paolo Rosso
Proceedings of the Fourteenth Workshop on Semantic Evaluation
The present paper describes the system submitted by the PRHLT-UPV team for the task 12 of SemEval-2020: OffensEval 2020. The official title of the task is Multilingual Offensive Language Identification in Social Media, and aims to identify offensive language in texts. The languages included in the task are English, Arabic, Danish, Greek and Turkish. We propose a model based on the BERT architecture for the analysis of texts in English. The approach leverages knowledge within a pre-trained model and performs fine-tuning for the particular task. In the analysis of the other languages the Multilingual BERT is used, which has been pre-trained for a large number of languages. In the experiments, the proposed method for English texts is compared with other approaches to analyze the relevance of the architecture used. Furthermore, simple models for the other languages are evaluated to compare them with the proposed one. The experimental results show that the model based on BERT outperforms other approaches. The main contribution of this work lies in this study, despite not obtaining the first positions in most cases of the competition ranking.