Fernando Sanchez-Vega

Also published as: Fernando Sánchez Vega, Fernando Sánchez-Vega, Fernando Sanchez - Vega

2025

We present the Mu-SHROOM shared task which is focused on detecting hallucinations and other overgeneration mistakes in the output of instruction-tuned large language models (LLMs).Mu-SHROOM addresses general-purpose LLMs in 14 languages, and frames the hallucination detection problem as a span-labeling task. We received 2,618 submissions from 43 participating teams employing diverse methodologies. The very high number of submissions highlights the interest of the community in hallucination detection. We present the results of the participating systems and provide an empirical analysis in order to better understand the factors that can lead to strong performance in this task. We also underscore current challenges, notably the varying degree of hallucinations across languages and the high annotator disagreement when labeling hallucination spans.

pdf bib abs

NLP-Cimat at SemEval-2025 Task 11: Prompt Optimization for LLMs via Genetic Algorithms and Systematic Mutation applied on Emotion Detection
Guillermo Segura-Gómez | Adrian Pastor Lopez Monroy | Fernando Sanchez - Vega | Alejandro Rosales Pérez
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Large Language Models (LLMs) have shown remarkable performance across diverse natural language processing tasks in recent years. However, optimizing instructions to maximize model performance remains a challenge due to the vast search space and the nonlinear relationship between input structure and output quality. This work explores an alternative prompt optimization technique based on genetic algorithms with different structured mutation processes. Unlike traditional random mutations, our method introduces variability in each generation through a guided mutation, enhancing the likelihood of obtaining better prompts for each generation. We apply this approach to emotion detection in the context of SemEval 2025 Task 11, demonstrating the potential to improve prompt efficiency, and consequently task performance. Experimental results show that our method yields competitive results compared to standard optimization techniques while maintaining interpretability and scalability.

2024

pdf bib abs

Adaptive Cross-lingual Text Classification through In-Context One-Shot Demonstrations
Emilio Cueva | Adrian Lopez Monroy | Fernando Sánchez-Vega | Thamar Solorio
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Zero-Shot Cross-lingual Transfer (ZS-XLT) utilizes a model trained in a source language to make predictions in another language, often with a performance loss. To alleviate this, additional improvements can be achieved through subsequent adaptation using examples in the target language. In this paper, we exploit In-Context Tuning (ICT) for One-Shot Cross-lingual transfer in the classification task by introducing In-Context Cross-lingual Transfer (IC-XLT). The novel concept involves training a model to learn from context examples and subsequently adapting it during inference to a target language by prepending a One-Shot context demonstration in that language. Our results show that IC-XLT successfully leverages target-language examples to improve the cross-lingual capabilities of the evaluated mT5 model, outperforming prompt-based models in the Zero and Few-shot scenarios adapted through fine-tuning. Moreover, we show that when source-language data is limited, the fine-tuning framework employed for IC-XLT performs comparably to prompt-based fine-tuning with significantly more training data in the source language.

pdf bib abs

DAIC-WOZ: On the Validity of Using the Therapist’s prompts in Automatic Depression Detection from Clinical Interviews
Sergio Burdisso | Ernesto Reyes-Ramírez | Esaú Villatoro-tello | Fernando Sánchez-Vega | Adrian Lopez Monroy | Petr Motlicek
Proceedings of the 6th Clinical Natural Language Processing Workshop

Automatic depression detection from conversational data has gained significant interest in recent years.The DAIC-WOZ dataset, interviews conducted by a human-controlled virtual agent, has been widely used for this task.Recent studies have reported enhanced performance when incorporating interviewer’s prompts into the model.In this work, we hypothesize that this improvement might be mainly due to a bias present in these prompts, rather than the proposed architectures and methods.Through ablation experiments and qualitative analysis, we discover that models using interviewer’s prompts learn to focus on a specific region of the interviews, where questions about past experiences with mental health issues are asked, and use them as discriminative shortcuts to detect depressed participants. In contrast, models using participant responses gather evidence from across the entire interview.Finally, to highlight the magnitude of this bias, we achieve a 0.90 F1 score by intentionally exploiting it, the highest result reported to date on this dataset using only textual information.Our findings underline the need for caution when incorporating interviewers’ prompts into models, as they may inadvertently learn to exploit targeted prompts, rather than learning to characterize the language and behavior that are genuinely indicative of the patient’s mental health condition.

pdf bib abs

Improving aggressiveness detection using a data augmentation technique based on a Diffusion Language Model
Antonio D. Reyes-Ramírez | Mario Ezra Aragón | Fernando Sánchez-Vega | A. Pastor López-Monroy
Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024)

Cyberbullying has grown in recent years, largely attributed to the proliferation of social media users. This phenomenon manifests in various forms, such as hate speech and offensive language, increasing the necessity of effective detection models to tackle this problem. Most approaches focus on supervised algorithms, which have an important drawback—they heavily depend on the availability of ample training data. This paper attempts to tackle this insufficient data problem using data augmentation (DA) techniques. Concretely, we propose a novel data augmentation technique based on a Diffusion Language Model (DLA). We compare our proposed method against well-known DA techniques, such as contextual augmentation and Easy Data Augmentation (EDA). Our findings reveal a slight but promising improvement, leading to more robust results with very low variance. Additionally, we provide a comprehensive qualitative analysis using classification errors, and complementary analysis, shedding light on the nuances of our approach.

2023

pdf bib abs

Walter Burns at SemEval-2023 Task 5: NLP-CIMAT - Leveraging Model Ensembles for Clickbait Spoiling
Emilio Villa Cueva | Daniel Vallejo Aldana | Fernando Sánchez Vega | Adrián Pastor López Monroy
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper describes our participation in the Clickbait challenge at SemEval 2023. In this work, we address the Clickbait classification task using transformers models in an ensemble configuration. We tackle the Spoiler Generation task using a two-level ensemble strategy of models trained for extractive QA, and selecting the best K candidates for multi-part spoilers. In the test partitions, our approaches obtained a classification accuracy of 0.716 for classification and a BLEU-4 score of 0.439 for spoiler generation.

pdf bib abs

CIMAT-NLP@LT-EDI-2023: Finegrain Depression Detection by Multiple Binary Problems Approach
María de Jesús García Santiago | Fernando Sánchez Vega | Adrián Pastor López Monroy
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion

This work described the work of the team CIMAT-NLP on the Shared task of Detecting Signs of Depression from Social Media Text at LT-EDI@RANLP 2023, which consists of depression classification on three levels: “not depression”, “moderate” depression and “severe” depression on text from social media. In this work, we proposed two approaches: (1) a transformer model which can handle big text without truncation of its length, and (2) an ensemble of six binary Bag of Words. Our team placed fourth in the competition and found that models trained with our approaches could place second

pdf bib abs

Dynamic Regularization in UDA for Transformers in Multimodal Classification
Ivonne Monter-Aldana | Adrian Pastor Lopez Monroy | Fernando Sanchez-Vega
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Multimodal machine learning is a cutting-edge field that explores ways to incorporate information from multiple sources into models. As more multimodal data becomes available, this field has become increasingly relevant. This work focuses on two key challenges in multimodal machine learning. The first is finding efficient ways to combine information from different data types. The second is that often, one modality (e.g., text) is stronger and more relevant, making it difficult to identify meaningful patterns in the weaker modality (e.g., image). Our approach focuses on more effectively exploiting the weaker modality while dynamically regularizing the loss function. First, we introduce a new two-stream model called Multimodal BERT-ViT, which features a novel intra-CLS token fusion. Second, we utilize a dynamic adjustment that maintains a balance between specialization and generalization during the training to avoid overfitting, which we devised. We add this dynamic adjustment to the Unsupervised Data Augmentation (UDA) framework. We evaluate the effectiveness of these proposals on the task of multi-label movie genre classification using the Moviescope and MM-IMDb datasets. The evaluation revealed that our proposal offers substantial benefits, while simultaneously enabling us to harness the weaker modality without compromising the information provided by the stronger.

2018

pdf bib abs

INAOE-UPV at SemEval-2018 Task 3: An Ensemble Approach for Irony Detection in Twitter
Delia Irazú Hernández Farías | Fernando Sánchez-Vega | Manuel Montes-y-Gómez | Paolo Rosso
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes an ensemble approach to the SemEval-2018 Task 3. The proposed method is composed of two renowned methods in text classification together with a novel approach for capturing ironic content by exploiting a tailored lexicon for irony detection. We experimented with different ensemble settings. The obtained results show that our method has a good performance for detecting the presence of ironic content in Twitter.

2013

pdf bib

INAOE_UPV-CORE: Extracting Word Associations from Document Corpora to estimate Semantic Textual Similarity
Fernando Sánchez-Vega | Manuel Montes-y-Gómez | Paolo Rosso | Luis Villaseñor-Pineda
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity