María Victoria Cantero-Romero

Also published as: Maria Victoria Cantero Romero


2025

pdf bib
The PRECOM-SM Corpus: Gambling in Spanish Social Media
Pablo Álvarez-Ojeda | María Victoria Cantero-Romero | Anastasia Semikozova | Arturo Montejo-Raez
Proceedings of the 31st International Conference on Computational Linguistics

Gambling addiction is a “silent problem” in society, especially among young people in recent years due to the easy access to betting and gambling sites on the Internet through smartphones and personal computers. As online communities in messaging apps, forums and other “teenagers gathering” sites keep growing day by day, more textual information is available for its study. This work focuses on collecting text from online Spanish-speaking communities and analysing it in order to find patterns in written language from frequent and infrequent users on the collected platforms so that an emerging gambling addiction problem can be detected. In this paper, a newly built corpus is introduced, as well as an extensive description of how it has been made. Besides, some baseline experiments on the data have been carried on, employing the generated features after the analysis of the text with different machine learning approaches like the bag of words model or deep neural network encodings.

2024

pdf bib
CONAN-MT-SP: A Spanish Corpus for Counternarrative Using GPT Models
María Estrella Vallecillo Rodríguez | Maria Victoria Cantero Romero | Isabel Cabrera De Castro | Arturo Montejo Ráez | María Teresa Martín Valdivia
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper describes the automated generation of CounterNarratives (CNs) for Hate Speech (HS) in Spanish using GPT-based models. Our primary objective is to evaluate the performance of these models in comparison to human capabilities. For this purpose, the English CONAN Multitarget corpus is taken as a starting point and we use the DeepL API to automatically translate into Spanish. Two GPT-based models, GPT-3 and GPT-4, are applied to the HS segment through a few-shot prompting strategy to generate a new CN. As a consequence of our research, we have created a high quality corpus in Spanish that includes the original HS-CN pairs translated into Spanish, in addition to the CNs generated automatically with the GPT models and that have been evaluated manually. The resulting CONAN-MT-SP corpus and its evaluation will be made available to the research community, representing the most extensive linguistic resource of CNs in Spanish to date. The results demonstrate that, although the effectiveness of GPT-4 outperforms GPT-3, both models can be used as systems to automatically generate CNs to combat the HS. Moreover, these models consistently outperform human performance in most instances.