Miguel Ángel García-Cumbreras

Also published as: Miguel Á. García Cumbreras, M. Ángel García, Miguel García-Cumbreras, Miguel Ángel García Cumbreras, Miguel A. García-Cumbreras

2026

pdf bib abs

We investigate the role of large language models (LLMs) in promoting gender-inclusive language by evaluating their ability to rewrite biased text and generate counterfactual narratives across multiple languages. We introduce a shared task with two subtasks: gender-inclusive rewriting and counterfactual generation. The task covers five languages English, German, Spanish, Tamil, and Kannada reflecting diverse grammatical gender systems and sociocultural contexts. We release curated word-level and sentence-level datasets to support controlled inclusive generation. A total of 50 teams registered for the shared task, and around 8 teams submitted results. Submissions are evaluated using a hybrid framework combining rubric-based automatic scoring with expert human judgment. Finally, we provide an overview of participating systems and discuss key findings and challenges observed across languages.

pdf bib

Proceedings of the Sixth Workshop on Language Technology for Equality, Diversity, Inclusion
Bharathi Raja Chakravarthi | Bharathi B | Thenmozhi Durairaj | Miguel Ángel García Cumbreras | Salud María Jiménez-Zafra
Proceedings of the Sixth Workshop on Language Technology for Equality, Diversity, Inclusion

2024

This paper provides a comprehensive summary of the “Homophobia and Transphobia Detection in Social Media Comments” shared task, which was held at the LT-EDI@EACL 2024. The objective of this task was to develop systems capable of identifying instances of homophobia and transphobia within social media comments. This challenge was extended across ten languages: English, Tamil, Malayalam, Telugu, Kannada, Gujarati, Hindi, Marathi, Spanish, and Tulu. Each comment in the dataset was annotated into three categories. The shared task attracted significant interest, with over 60 teams participating through the CodaLab platform. The submission of prediction from the participants was evaluated with the macro F1 score.

pdf bib

Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion
Bharathi Raja Chakravarthi | Bharathi B | Paul Buitelaar | Thenmozhi Durairaj | György Kovács | Miguel Ángel García Cumbreras
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion

2023

pdf bib abs

Hope serves as a powerful driving force that encourages individuals to persevere in the face of the unpredictable nature of human existence. It instills motivation within us to remain steadfast in our pursuit of important goals, regardless of the uncertainties that lie ahead. In today’s digital age, platforms such as Facebook, Twitter, Instagram, and YouTube have emerged as prominent social media outlets where people freely express their views and opinions. These platforms have also become crucial for marginalized individuals seeking online assistance and support[1][2][3]. The outbreak of the pandemic has exacerbated people’s fears around the world, as they grapple with the possibility of losing loved ones and the lack of access to essential services such as schools, hospitals, and mental health facilities.

pdf bib abs

Overview of Second Shared Task on Homophobia and Transphobia Detection in Social Media Comments
Bharathi Raja Chakravarthi | Rahul Ponnusamy | Malliga Subramanian | Paul Buitelaar | Miguel Ángel García-Cumbreras | Salud María Jiménez-Zafra | José Antonio García-Díaz | Rafael Valencia-García | Nitesh Jindal
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion

We present an overview of the second shared task on homophobia/transphobia Detection in social media comments. Given a comment, a system must predict whether or not it contains any form of homophobia/transphobia. The shared task included five languages: English, Spanish, Tamil, Hindi, and Malayalam. The data was given for two tasks. Task A was given three labels, and Task B fine-grained seven labels. In total, 75 teams enrolled for the shared task in Codalab. For task A, 12 teams submitted systems for English, eight teams for Tamil, eight teams for Spanish, and seven teams for Hindi. For task B, nine teams submitted for English, 7 teams for Tamil, 6 teams for Malayalam. We present and analyze all submissions in this paper.

2022

pdf bib abs

Hope Speech detection is the task of classifying a sentence as hope speech or non-hope speech given a corpus of sentences. Hope speech is any message or content that is positive, encouraging, reassuring, inclusive and supportive that inspires and engenders optimism in the minds of people. In contrast to identifying and censoring negative speech patterns, hope speech detection is focussed on recognising and promoting positive speech patterns online. In this paper, we report an overview of the findings and results from the shared task on hope speech detection for Tamil, Malayalam, Kannada, English and Spanish languages conducted in the second workshop on Language Technology for Equality, Diversity and Inclusion (LT-EDI-2022) organised as a part of ACL 2022. The participants were provided with annotated training & development datasets and unlabelled test datasets in all the five languages. The goal of the shared task is to classify the given sentences into one of the two hope speech classes. The performances of the systems submitted by the participants were evaluated in terms of micro-F1 score and weighted-F1 score. The datasets for this challenge are openly available

2019

pdf bib abs

SINAI-DL at SemEval-2019 Task 7: Data Augmentation and Temporal Expressions
Miguel A. García-Cumbreras | Salud María Jiménez-Zafra | Arturo Montejo-Ráez | Manuel Carlos Díaz-Galiano | Estela Saquete
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes the participation of the SINAI-DL team at RumourEval (Task 7 in SemEval 2019, subtask A: SDQC). SDQC addresses the challenge of rumour stance classification as an indirect way of identifying potential rumours. Given a tweet with several replies, our system classifies each reply into either supporting, denying, questioning or commenting on the underlying rumours. We have applied data augmentation, temporal expressions labelling and transfer learning with a four-layer neural classifier. We achieve an accuracy of 0.715 with the official run over reply tweets.

pdf bib abs

SINAI-DL at SemEval-2019 Task 5: Recurrent networks and data augmentation by paraphrasing
Arturo Montejo-Ráez | Salud María Jiménez-Zafra | Miguel A. García-Cumbreras | Manuel Carlos Díaz-Galiano
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes the participation of the SINAI-DL team at Task 5 in SemEval 2019, called HatEval. We have applied some classic neural network layers, like word embeddings and LSTM, to build a neural classifier for both proposed tasks. Due to the small amount of training data provided compared to what is expected for an adequate learning stage in deep architectures, we explore the use of paraphrasing tools as source for data augmentation. Our results show that this method is promising, as some improvement has been found over non-augmented training sets.

Co-authors

Venues

Fix author