Damián Ariel Furman


2023

pdf bib
Which Argumentative Aspects of Hate Speech in Social Media can be reliably identified?
Damián Ariel Furman | Pablo Torres | José A. Rodríguez | Laura Alonso Alemany | Diego Letzen | Vanina Martínez
Proceedings of the Fourth International Workshop on Designing Meaning Representations

The expansion of Large Language Models (LLMs) into more serious areas of application, involving decision-making and the forming of public opinion, calls for a more thoughtful treatment of texts. Augmenting them with explicit and understandable argumentative analysis could foster a more reasoned usage of chatbots, text completion mechanisms or other applications. However, it is unclear which aspects of argumentation can be reliably identified and integrated by them. In this paper we propose an adaptation of Wagemans (2016)’s Periodic Table of Arguments to identify different argumentative aspects of texts, with a special focus on hate speech in social media. We have empirically assessed the reliability with which each of these aspects can be automatically identified. We analyze the implications of these results, and how to adapt the proposal to obtain reliable representations of those that cannot be successfully identified.

2022

pdf bib
RoBERTuito: a pre-trained language model for social media text in Spanish
Juan Manuel Pérez | Damián Ariel Furman | Laura Alonso Alemany | Franco M. Luque
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Since BERT appeared, Transformer language models and transfer learning have become state-of-the-art for natural language processing tasks. Recently, some works geared towards pre-training specially-crafted models for particular domains, such as scientific papers, medical documents, user-generated texts, among others. These domain-specific models have been shown to improve performance significantly in most tasks; however, for languages other than English, such models are not widely available. In this work, we present RoBERTuito, a pre-trained language model for user-generated text in Spanish, trained on over 500 million tweets. Experiments on a benchmark of tasks involving user-generated text showed that RoBERTuito outperformed other pre-trained language models in Spanish. In addition to this, our model has some cross-lingual abilities, achieving top results for English-Spanish tasks of the Linguistic Code-Switching Evaluation benchmark (LinCE) and also competitive performance against monolingual models in English Twitter tasks. To facilitate further research, we make RoBERTuito publicly available at the HuggingFace model hub together with the dataset used to pre-train it.