Eugenio Martínez-Cámara

Also published as: Eugenio Martinez Camara, Eugenio Martínez Cámara, Eugenio Martínez Cámara

2025

pdf bib abs
SemEval-2025 Task 8: Question Answering over Tabular Data
Jorge Osés Grijalba | L. Alfonso Ureñ - López | Eugenio Martínez Cámara | Jose Camacho - Collados
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

We introduce the findings and results of SemEval-2025 Task 8: Question Answering over Tabular Data. We featured two subtasks, DataBench and DataBench Lite. DataBench consists on question answering over tabular data, and DataBench Lite small comprising small datasets that might be easier to manage by current models by for example fitting them into a prompt. The task was open for any approach, but their answer has to conform to a required typing format. In this paper we present the task, analyze a number of system submissions and discuss the results. The results show how approaches leveraging LLMs dominated the task, with larger models exhibiting a considerably superior performance compared to small models.

2024

pdf bib abs
Question Answering over Tabular Data with DataBench: A Large-Scale Empirical Evaluation of LLMs
Jorge Osés Grijalba | L. Alfonso Ureña-López | Eugenio Martínez Cámara | Jose Camacho-Collados
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Large Language Models (LLMs) are showing emerging abilities, and one of the latest recognized ones deals with their ability to reason and answer questions from tabular data. Although there are some available datasets to assess question answering systems on tabular data, they are not large and diverse enough to properly assess the capabilities of LLMs. To this end, we propose DataBench, a benchmark composed of 65 real-world datasets over several domains, including 20 human-generated questions per dataset, totaling 1300 questions and answers overall. Using this benchmark, we perform a large-scale empirical comparison of several open and closed source models, including both code-generating and in-context learning models. The results highlight the current gap between open-source and closed-source models, with all types of model having room for improvement even in simple boolean questions or involving a single column.

pdf bib abs
The Influence of the Perplexity Score in the Detection of Machine-generated Texts
Alberto José Gutiérrez Megías | L. Alfonso Ureña-López | Eugenio Martínez Cámara
Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security

The high performance of large language models (LLM) generating natural language represents a real threat, since they can be leveraged to generate any kind of deceptive content. Since there are still disparities among the language generated by machines and the human language, we claim that perplexity may be used as classification signal to discern between machine and human text. We propose a classification model based on XLM-RoBERTa, and we evaluate it on the M4 dataset. The results show that the perplexity score is useful for the identification of machine generated text, but it is constrained by the differences among the LLMs used in the training and test sets.

pdf bib abs
Smart Lexical Search for Label Flipping Adversial Attack
Alberto Gutiérrez-Megías | Salud María Jiménez-Zafra | L. Alfonso Ureña | Eugenio Martínez-Cámara
Proceedings of the Fifth Workshop on Privacy in Natural Language Processing

Language models are susceptible to vulnerability through adversarial attacks, using manipulations of the input data to disrupt their performance. Accordingly, it represents a cibersecurity leak. Data manipulations are intended to be unidentifiable by the learning model and by humans, small changes can disturb the final label of a classification task. Hence, we propose a novel attack built upon explainability methods to identify the salient lexical units to alter in order to flip the classification label. We asses our proposal on a disinformation dataset, and we show that our attack reaches high balance among stealthiness and efficiency.

pdf bib abs
SINAI at SemEval-2024 Task 8: Fine-tuning on Words and Perplexity as Features for Detecting Machine Written Text
Alberto Gutiérrez Megías | L. Alfonso Ureña-lópez | Eugenio Martínez Cámara
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This work presents the proposed systems of the SINAI team for the subtask A of the Task 8 in SemEval 2024. We present the evaluation of two disparate systems, and our final submitted system. We claim that the perplexity value of a text may be used as classification signal. Accordingly, we conduct a study on the utility of perplexity for discerning text authorship, and we perform a comparative analysis of the results obtained on the datasets of the task. This comparative evaluation includes results derived from the systems evaluated, such as fine-tuning using an XLM-RoBERTa-Large transformer or using perplexity as a classification criterion. In addition, we discuss the results reached on the test set, where we show that there is large differences among the language probability distribution of the training and test sets. These analysis allows us to open new research lines to improve the detection of machine-generated text.

pdf bib abs
Federated Learning for Exploiting Annotators’ Disagreements in Natural Language Processing
Nuria Rodríguez-Barroso | Eugenio Martínez Cámara | Jose Camacho Collados | M. Victoria Luzón | Francisco Herrera
Transactions of the Association for Computational Linguistics, Volume 12

The annotation of ambiguous or subjective NLP tasks is usually addressed by various annotators. In most datasets, these annotations are aggregated into a single ground truth. However, this omits divergent opinions of annotators, hence missing individual perspectives. We propose FLEAD (Federated Learning for Exploiting Annotators’ Disagreements), a methodology built upon federated learning to independently learn from the opinions of all the annotators, thereby leveraging all their underlying information without relying on a single ground truth. We conduct an extensive experimental study and analysis in diverse text classification tasks to show the contribution of our approach with respect to mainstream approaches based on majority voting and other recent methodologies that also learn from annotator disagreements.

2022

In this paper we present TweetNLP, an integrated platform for Natural Language Processing (NLP) in social media. TweetNLP supports a diverse set of NLP tasks, including generic focus areas such as sentiment analysis and named entity recognition, as well as social media-specific tasks such as emoji prediction and offensive language identification. Task-specific systems are powered by reasonably-sized Transformer-based language models specialized on social media text (in particular, Twitter) which can be run without the need for dedicated hardware or cloud services. The main contributions of TweetNLP are: (1) an integrated Python library for a modern toolkit supporting social media analysis using our various task-specific models adapted to the social domain; (2) an interactive online demo for codeless experimentation using our models; and (3) a tutorial covering a wide variety of typical social media applications.

2018

pdf bib abs
SINAI at IEST 2018: Neural Encoding of Emotional External Knowledge for Emotion Classification
Flor Miriam Plaza-del-Arco | Eugenio Martínez-Cámara | Maite Martin | L. Alfonso Ureña- López
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

In this paper, we describe our participation in WASSA 2018 Implicit Emotion Shared Task (IEST 2018). We claim that the use of emotional external knowledge may enhance the performance and the capacity of generalization of an emotion classification system based on neural networks. Accordingly, we submitted four deep learning systems grounded in a sequence encoding layer. They mainly differ in the feature vector space and the recurrent neural network used in the sequence encoding layer. The official results show that the systems that used emotional external knowledge have a higher capacity of generalization, hence our claim holds.

2017

We propose to move from Open Information Extraction (OIE) ahead to Open Knowledge Representation (OKR), aiming to represent information conveyed jointly in a set of texts in an open text-based manner. We do so by consolidating OIE extractions using entity and predicate coreference, while modeling information containment between coreferring elements via lexical entailment. We suggest that generating OKR structures can be a useful step in the NLP pipeline, to give semantic applications an easy handle on consolidated information across multiple texts.

The Story Cloze test is a recent effort in providing a common test scenario for text understanding systems. As part of the LSDSem 2017 shared task, we present a system based on a deep learning architecture combined with a rich set of manually-crafted linguistic features. The system outperforms all known baselines for the task, suggesting that the chosen approach is promising. We additionally present two methods for generating further training data based on stories from the ROCStories corpus.

pdf bib
Neural Disambiguation of Causal Lexical Markers Based on Context
Eugenio Martínez-Cámara | Vered Shwartz | Iryna Gurevych | Ido Dagan
Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Short papers