Paloma Moreda Pozo

Also published as: Paloma Moreda, Paloma Moreda Pozo, Paloma Moreda-Pozo


2025

pdf bib
Leveraging Large Language Models to Measure Gender Representation Bias in Gendered Language Corpora
Erik Derner | Sara Sansalvador De La Fuente | Yoan Gutierrez | Paloma Moreda Pozo | Nuria M Oliver
Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

Large language models (LLMs) often inherit and amplify social biases embedded in their training data. A prominent social bias is gender bias. In this regard, prior work has mainly focused on gender stereotyping bias – the association of specific roles or traits with a particular gender – in English and on evaluating gender bias in model embeddings or generated outputs. In contrast, gender representation bias – the unequal frequency of references to individuals of different genders – in the training corpora has received less attention. Yet such imbalances in the training data constitute an upstream source of bias that can propagate and intensify throughout the entire model lifecycle. To fill this gap, we propose a novel LLM-based method to detect and quantify gender representation bias in LLM training data in gendered languages, where grammatical gender challenges the applicability of methods developed for English. By leveraging the LLMs’ contextual understanding, our approach automatically identifies and classifies person-referencing words in gendered language corpora. Applied to four Spanish-English benchmarks and five Valencian corpora, our method reveals substantial male-dominant imbalances. We show that such biases in training data affect model outputs, but can surprisingly be mitigated leveraging small-scale training on datasets that are biased towards the opposite gender. Our findings highlight the need for corpus-level gender bias analysis in multilingual NLP. We make our code and data publicly available.

pdf bib
Balancing the Scales: Addressing Gender Bias in Social Media Toxicity Detection
Beatriz Botella-Gil | Juan Pablo Consuegra-Ayala | Alba Bonet-Jover | Paloma Moreda-Pozo
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

The detection of toxic content in social media has become a critical task in Natural Language Processing (NLP), particularly given its intersection with complex issues like subjectivity, implicit language, and cultural context. Among these challenges, bias in training data remains a central concern—especially as language models risk reproducing and amplifying societal inequalities. This paper investigates the interplay between toxicity and gender bias on Twitter/X by introducing a novel dataset of violent and non-violent tweets, annotated not only for violence but also for gender. We conduct an exploratory analysis of how biased data can distort toxicity classification and present algorithms to mitigate these effects through dataset balancing and debiasing. Our contributions include four new dataset splits—two balanced and two debiased—that aim to support the development of fairer and more inclusive NLP models. By foregrounding the importance of equity in data curation, this work lays the groundwork for more ethical approaches to automated violence detection and gender annotation.

pdf bib
“Simple-Tool”: A Tool for the Automatic Transformation of Spanish Texts into Easy-to-Read
Beatriz Botella-Gil | Isabel Espinosa-Zaragoza | Paloma Moreda Pozo | Manuel Palomar
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

Automatic Text Simplification (ATS) has emerged as a key area of research within the field of Natural Language Processing, aiming to improve access to information by reducing the linguistic complexity of texts. Simplification can be applied at various levels—lexical, syntactic, semantic, and stylistic—and must be tailored to meet the needs of different target audiences, such as individuals with cognitive disabilities, low-literacy readers, or non-native speakers. This work introduces a tool that automatically adapts Spanish texts into Easy-to-Read format, enhancing comprehension for people with cognitive or reading difficulties. The proposal is grounded in a critical review of existing Spanish-language resources and addresses the need for accessible, well-documented solutions aligned with official guidelines, reinforcing the potential of text simplification as a strategy for inclusion.

pdf bib
Where and How as Key Factors for Knowledge-Enhanced Constrained Commonsense Generation
Ivan Martinez-Murillo | Paloma Moreda Pozo | Elena Lloret
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

This paper addresses a key limitation in Natural Language Generation (NLG) systems: their struggle with commonsense reasoning, which is essential for generating contextually appropriate and plausible text. The study proposes an approach to enhance the commonsense reasoning abilities of NLG systems by integrating external knowledge framed in a constrained commonsense generation task. The paper investigates strategies for extracting and injecting external knowledge into pre-trained models, specifically BART and T5, in both base and large configurations. Experimental results show that incorporating external knowledge extracted with a simple strategy leads to significant improvements in performance, with the models achieving 88% accuracy in generating plausible and correct sentences. When refined methods for knowledge extraction are applied, the accuracy further increases to 92%. These findings underscore the crucial role of high-quality external knowledge in enhancing the commonsense reasoning capabilities of NLG systems, suggesting that such integration is vital for advancing their performance in real-world applications.

pdf bib
Detecting Deception in Disinformation across Languages: The Role of Linguistic Markers
Alba Perez-Montero | Silvia Gargova | Elena Lloret | Paloma Moreda Pozo
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

The unstoppable proliferation of news driven by the rise of digital media has intensified the challenge of news verification. Natural Language Processing (NLP) offers solutions, primarily through content and context analysis. Recognizing the vital role of linguistic analysis, this paper presents a multilingual study of linguistic markers for automated deceptive fake news detection across English, Spanish, and Bulgarian. We compiled datasets in these languages to extract and analyze both general and specific linguistic markers. We then performed feature selection using the SelectKBest algorithm, applying it to various classification models with different combinations of general and specific linguistic markers. The results show that Logistic Regression and Support Vector Machine classification models achieved F1-scores above 0.8 for English and Spanish. For Bulgarian, Random Forest yielded the best results with an F1-score of 0.73. While these markers demonstrate potential for transferability to other languages, results may vary due to inherent linguistic characteristics. This necessitates further experimentation, especially in low-resource languages like Bulgarian. These findings highlight the significant potential of our dataset and linguistic markers for multilingual deceptive news detection.

pdf bib
GPLSICORTEX at SemEval-2025 Task 10: Leveraging Intentions for Generating Narrative Extractions
Ivan Martinez - Murillo | María Miró Maestre | Aitana Martínez | Snorre Ralund | Elena Lloret | Paloma Moreda Pozo | Armando Suárez Cueto
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper describes our approach to address the SemEval-2025 Task 10 subtask 3, which is focused on narrative extraction given news articles with a dominant narrative. We design an external knowledge injection approach to fine-tune a Flan-T5 model so the generated narrative explanations are in line with the dominant narrative determined in each text. We also incorporate pragmatic information in the form of communicative intentions, using them as external knowledge to assist the model. This ensures that the generated texts align more closely with the intended explanations and effectively convey the expected meaning. The results show that our approach ranks 3rd in the task leaderboard (0.7428 in Macro-F1) with concise and effective news explanations. The analyses highlight the importance of adding pragmatic information when training systems to generate adequate narrative extractions.

2023

pdf bib
Towards an Efficient Approach for Controllable Text Generation
Iván Martínez-Murillo | Paloma Moreda | Elena Lloret
Proceedings of the 1st International Workshop on Multilingual, Multimodal and Multitask Language Generation

pdf bib
A Review of Research-Based Automatic Text Simplification Tools
Isabel Espinosa-Zaragoza | José Abreu-Salas | Elena Lloret | Paloma Moreda | Manuel Palomar
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

In the age of knowledge, the democratisation of information facilitated through the Internet may not be as pervasive if written language poses challenges to particular sectors of the population. The objective of this paper is to present an overview of research-based automatic text simplification tools. Consequently, we describe aspects such as the language, language phenomena, language levels simplified, approaches, specific target populations these tools are created for (e.g. individuals with cognitive impairment, attention deficit, elderly people, children, language learners), and accessibility and availability considerations. The review of existing studies covering automatic text simplification tools is undergone by searching two databases: Web of Science and Scopus. The eligibility criteria involve text simplification tools with a scientific background in order to ascertain how they operate. This methodology yielded 27 text simplification tools that are further analysed. Some of the main conclusions reached with this review are the lack of resources accessible to the public, the need for customisation to foster the individual’s independence by allowing the user to select what s/he finds challenging to understand while not limiting the user’s capabilities and the need for more simplification tools in languages other than English, to mention a few.

pdf bib
Automatic Text Simplification for People with Cognitive Disabilities: Resource Creation within the ClearText Project
Isabel Espinosa-Zaragoza | José Abreu-Salas | Paloma Moreda | Manuel Palomar
Proceedings of the Second Workshop on Text Simplification, Accessibility and Readability

This paper presents the ongoing work conducted within the ClearText project, specifically focusing on the resource creation for the simplification of Spanish for people with cognitive disabilities. These resources include the CLEARSIM corpus and the Simple.Text tool. On the one hand, a description of the corpus compilation process with the help of APSA is detailed along with information regarding whether these texts are bronze, silver or gold standard simplification versions from the original text. The goal to reach is 18,000 texts in total by the end of the project. On the other hand, we aim to explore Large Language Models (LLMs) in a sequence-to-sequence setup for text simplification at the document level. Therefore, the tool’s objectives, technical aspects, and the preliminary results derived from early experimentation are also presented. The initial results are subject to improvement, given that experimentation is in a very preliminary stage. Despite showcasing flaws inherent to generative models (e.g. hallucinations, repetitive text), we examine the resolutions (or lack thereof) of complex linguistic phenomena that can be learned from the corpus. These issues will be addressed throughout the remainder of this project. The expected positive results from this project that will impact society are three-fold in nature: scientific-technical, social, and economic.

2017

pdf bib
A Domain and Language Independent Named Entity Classification Approach Based on Profiles and Local Information
Isabel Moreno | María Teresa Romá-Ferri | Paloma Moreda Pozo
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

This paper presents a Named Entity Classification system, which employs machine learning. Our methodology employs local entity information and profiles as feature set. All features are generated in an unsupervised manner. It is tested on two different data sets: (i) DrugSemantics Spanish corpus (Overall F1 = 74.92), whose results are in-line with the state of the art without employing external domain-specific resources. And, (ii) English CONLL2003 dataset (Overall F1 = 81.40), although our results are lower than previous work, these are reached without external knowledge or complex linguistic analysis. Last, using the same configuration for the two corpora, the difference of overall F1 is only 6.48 points (DrugSemantics = 74.92 versus CoNLL2003 = 81.40). Thus, this result supports our hypothesis that our approach is language and domain independent and does not require any external knowledge or complex linguistic analysis.

2015

pdf bib
Pattern Construction for Extracting Domain Terminology
Yusney Marrero García | Paloma Moreda Pozo | Rafael Muñoz-Guillena
Proceedings of the International Conference Recent Advances in Natural Language Processing

2014

pdf bib
Mining Lexical Variants from Microblogs: An Unsupervised Multilingual Approach
Alejandro Mosquera | Paloma Moreda Pozo
Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM)

2013

pdf bib
Improving Web 2.0 Opinion Mining Systems Using Text Normalisation Techniques
Alejandro Mosquera | Paloma Moreda Pozo
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

2011

pdf bib
The Use of Metrics for Measuring Informality Levels in Web 2.0 Texts
Alejandro Mosquera | Paloma Moreda
Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology