Beatriz Botella-Gil
2025
Balancing the Scales: Addressing Gender Bias in Social Media Toxicity Detection
Beatriz Botella-Gil
|
Juan Pablo Consuegra-Ayala
|
Alba Bonet-Jover
|
Paloma Moreda-Pozo
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
The detection of toxic content in social media has become a critical task in Natural Language Processing (NLP), particularly given its intersection with complex issues like subjectivity, implicit language, and cultural context. Among these challenges, bias in training data remains a central concern—especially as language models risk reproducing and amplifying societal inequalities. This paper investigates the interplay between toxicity and gender bias on Twitter/X by introducing a novel dataset of violent and non-violent tweets, annotated not only for violence but also for gender. We conduct an exploratory analysis of how biased data can distort toxicity classification and present algorithms to mitigate these effects through dataset balancing and debiasing. Our contributions include four new dataset splits—two balanced and two debiased—that aim to support the development of fairer and more inclusive NLP models. By foregrounding the importance of equity in data curation, this work lays the groundwork for more ethical approaches to automated violence detection and gender annotation.
“Simple-Tool”: A Tool for the Automatic Transformation of Spanish Texts into Easy-to-Read
Beatriz Botella-Gil
|
Isabel Espinosa-Zaragoza
|
Paloma Moreda Pozo
|
Manuel Palomar
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Automatic Text Simplification (ATS) has emerged as a key area of research within the field of Natural Language Processing, aiming to improve access to information by reducing the linguistic complexity of texts. Simplification can be applied at various levels—lexical, syntactic, semantic, and stylistic—and must be tailored to meet the needs of different target audiences, such as individuals with cognitive disabilities, low-literacy readers, or non-native speakers. This work introduces a tool that automatically adapts Spanish texts into Easy-to-Read format, enhancing comprehension for people with cognitive or reading difficulties. The proposal is grounded in a critical review of existing Spanish-language resources and addresses the need for accessible, well-documented solutions aligned with official guidelines, reinforcing the potential of text simplification as a strategy for inclusion.