Alba María Mármol-Romero

Also published as: Alba María Mármol Romero

2026

Nuanced Toxicity Detection in Spanish: A New Corpus and Benchmark Study
Alba María Mármol-Romero | Robiert Sepúlveda-Torres | Estela Saquete | María-Teresa Martín-Valdivia | L. Alfonso Ureña
Findings of the Association for Computational Linguistics: EACL 2026

The rise of toxic content on digital platforms has intensified the demand for automatic moderation tools. While English has benefited from large-scale annotated corpora, Spanish remains under-resourced, particularly for nuanced cases of toxicity such as irony, sarcasm, or indirect aggression. We present an extended version of the NECOS-TOX corpus, comprising 4,011 Spanish comments collected from 16 major news outlets. Each comment is annotated across three levels of toxicity (Non-Toxic, Slightly Toxic, and Toxic), following an iterative annotation protocol that achieved substantial inter-annotator agreement (k = 0.74). To reduce annotation costs while maintaining quality, we employed a human-in-the-loop active learning strategy, with manual correction of model pre-labels. We benchmarked the dataset with traditional machine learning (ML) methods, domain-specific transformers, and instruction-tuned large language models (LLMs). Results show that compact encoder models (e.g., RoBERTa-base-bne, 125M parameters) perform on par with much larger models (e.g., LLaMA-3.1-8B), underscoring the value of in-domain adaptation over raw scale. Our error analysis highlights persistent challenges in distinguishing subtle forms of toxicity, especially sarcasm and implicit insults, and reveals entity-related biases that motivate anonymization strategies. The dataset and trained models are released publicly.

2024

pdf bib abs

Environmental Impact Measurement in the MentalRiskES Evaluation Campaign
Alba María Mármol-Romero | Adrián Moreno-Muñoz | Flor Miriam Plaza-Del-Arco | M. Dolores Molina-González | Arturo Montejo-Ráez
Proceedings of the Second International Workshop Towards Digital Language Equality (TDLE): Focusing on Sustainability @ LREC-COLING 2024

With the rise of Large Language Models (LLMs), the NLP community is increasingly aware of the environmental consequences of model development due to the energy consumed for training and running these models. This study investigates the energy consumption and environmental impact of systems participating in the MentalRiskES shared task, at the Iberian Language Evaluation Forum (IberLEF) in the year 2023, which focuses on early risk identification of mental disorders in Spanish comments. Participants were asked to submit, for each prediction, a set of efficiency metrics, being carbon dioxide emissions among them. We conduct an empirical analysis of the data submitted considering model architecture, task complexity, and dataset characteristics, covering a spectrum from traditional Machine Learning (ML) models to advanced LLMs. Our findings contribute to understanding the ecological footprint of NLP systems and advocate for prioritizing environmental impact assessment in shared tasks to foster sustainability across diverse model types and approaches, being evaluation campaigns an adequate framework for this kind of analysis.

pdf bib abs

MentalRiskES: A New Corpus for Early Detection of Mental Disorders in Spanish
Alba María Mármol Romero | Adrián Moreno-Muñoz | Flor Miriam Plaza-Del-Arco | M. Dolores Molina-González | Arturo Montejo-Ráez
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

With mental health issues on the rise on the Web, especially among young people, there is a growing need for effective identification and intervention. In this paper, we introduce a new open-sourced corpus for the early detection of mental disorders in Spanish, focusing on eating disorders, depression, and anxiety. It consists of user messages posted on groups within the Telegram message platform and contains over 1,300 subjects with more than 45,000 messages posted in different public Telegram groups. This corpus has been manually annotated via crowdsourcing and is prepared for its use in several Natural Language Processing tasks including text classification and regression tasks. The samples in the corpus include both text and time data. To provide a benchmark for future research, we conduct experiments on text classification and regression by using state-of-the-art transformer-based models.

Co-authors

Estela Saquete 1

Robiert Sepúlveda-Torres 1

L. Alfonso Ureña 1

Venues

Fix author