Paloma Martínez

Also published as: Paloma Martinez

2025

PTUK-HULAT at AraGenEval Shared Task: Fine-tuning XLM-RoBERTa for AI-Generated Arabic News Detection
Tasneem Duridi | Areej Jaber | Paloma Martínez
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

pdf bib abs

HULAT-UC3M at TSAR 2025 Shared Task A Prompt-Based Approach using Lightweight Language Models for Readability-Controlled Text Simplification
Jesus M. Sanchez-Gomez | Lourdes Moreno | Paloma Martínez | Marco Antonio Sanchez-Escudero
Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)

This paper describes the participation of the HULAT-UC3M team in the TSAR 2025 Shared Task on Readability-Controlled Text Simplification. Our approach uses open and lightweight Large Language Models (LLMs) with different sizes, together with two strategies for prompt engineering. The proposed system has been tested on the trial data provided, and evaluated using the official metrics CEFR Compliance, Meaning Preservation, and Similarity to References. LLaMA 3 8B model with reinforced prompts was selected as our final proposal for submission, and ranking fourteenth according to the overall metric. Finally, we discuss the main challenges that we identified in developing our approach for this task.

2024

pdf bib abs

HULAT-UC3M at BiolaySumm: Adaptation of BioBART and Longformer models to summarizing biomedical documents
Adrian Gonzalez Sanchez | Paloma Martínez
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing

This article presents our submission to the Bio- LaySumm 2024 shared task: Lay Summarization of Biomedical Research Articles. The objective of this task is to generate summaries that are simplified in a concise and less technical way, in order to facilitate comprehension by non-experts users. A pre-trained BioBART model was employed to fine-tune the articles from the two journals, thereby generating two models, one for each journal. The submission achieved the 12th best ranking in the task, attaining a meritorious first place in the Relevance ROUGE-1 metric.

2023

pdf bib abs

PTUK-HULAT at ArAIEval Shared Task Fine-tuned Distilbert to Predict Disinformative Tweets
Areej Jaber | Paloma Martinez
Proceedings of ArabicNLP 2023

Disinformation involves the dissemination of incomplete, inaccurate, or misleading information; it has the objective, goal, or purpose of deliberately or intentionally lying to others aboutthe truth. The spread of disinformative information on social media has serious implications, and it causes concern among internet users in different aspects. Automatic classification models are required to detect disinformative posts on social media, especially on Twitter. In this article, DistilBERT multilingual model was fine-tuned to classify tweets either as dis-informative or not dis-informative in Subtask 2A of the ArAIEval shared task. The system outperformed the baseline and achieved F1 micro 87% and F1 macro 80%. Our system ranked 11 compared with all participants.

2022

pdf bib abs

UC3M-PUCPR at SemEval-2022 Task 11: An Ensemble Method of Transformer-based Models for Complex Named Entity Recognition
Elisa Schneider | Renzo M. Rivera-Zavala | Paloma Martinez | Claudia Moro | Emerson Paraiso
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This study introduces the system submitted to the SemEval 2022 Task 11: MultiCoNER (Multilingual Complex Named Entity Recognition) by the UC3M-PUCPR team. We proposed an ensemble of transformer-based models for entity recognition in cross-domain texts. Our deep learning method benefits from the transformer architecture, which adopts the attention mechanism to handle the long-range dependencies of the input text. Also, the ensemble approach for named entity recognition (NER) improved the results over baselines based on individual models on two of the three tracks we participated in. The ensemble model for the code-mixed task achieves an overall performance of 76.36% F1-score, a 2.85 percentage point increase upon our individually best model for this task, XLM-RoBERTa-large (73.51%), outperforming the baseline provided for the shared task by 18.26 points. Our preliminary results suggest that contextualized language models ensembles can, even if modestly, improve the results in extracting information from unstructured data.

2020

pdf bib abs

Combining financial word embeddings and knowledge-based features for financial text summarization UC3M-MC System at FNS-2020
Jaime Baldeon Suarez | Paloma Martínez | Jose Luis Martínez
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation

This paper describes the systems proposed by HULAT research group from Universidad Carlos III de Madrid (UC3M) and MeaningCloud (MC) company to solve the FNS 2020 Shared Task on summarizing financial reports. We present a narrative extractive approach that implements a statistical model comprised of different features that measure the relevance of the sentences using a combination of statistical and machine learning methods. The key to the model’s performance is its accurate representation of the text, since the word embeddings used by the model have been trained with the summaries of the training dataset and therefore capture the most salient information from the reports. The systems’ code can be found at https://github.com/jaimebaldeon/FNS-2020.

2019

pdf bib abs

Deep neural model with enhanced embeddings for pharmaceutical and chemical entities recognition in Spanish clinical text
Renzo Rivera | Paloma Martínez
Proceedings of the 5th Workshop on BioNLP Open Shared Tasks

In this work, we introduce a Deep Learning architecture for pharmaceutical and chemical Named Entity Recognition in Spanish clinical cases texts. We propose a hybrid model approach based on two Bidirectional Long Short-Term Memory (Bi-LSTM) network and Conditional Random Field (CRF) network using character, word, concept and sense embeddings to deal with the extraction of semantic, syntactic and morphological features. The approach was evaluated on the PharmaCoNER Corpus obtaining an F-measure of 85.24% for subtask 1 and 49.36% for subtask2. These results prove that deep learning methods with specific domain embedding representations can outperform the state-of-the-art approaches.

2017

pdf bib abs

Exploring Convolutional Neural Networks for Sentiment Analysis of Spanish tweets
Isabel Segura-Bedmar | Antonio Quirós | Paloma Martínez
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Spanish is the third-most used language on the internet, after English and Chinese, with a total of 7.7% (more than 277 million of users) and a huge internet growth of more than 1,400%. However, most work on sentiment analysis has been focused on English. This paper describes a deep learning system for Spanish sentiment analysis. To the best of our knowledge, this is the first work that explores the use of a convolutional neural network to polarity classification of Spanish tweets.

pdf bib abs

LABDA at SemEval-2017 Task 10: Extracting Keyphrases from Scientific Publications by combining the BANNER tool and the UMLS Semantic Network
Isabel Segura-Bedmar | Cristóbal Colón-Ruiz | Paloma Martínez
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes the system presented by the LABDA group at SemEval 2017 Task 10 ScienceIE, specifically for the subtasks of identification and classification of keyphrases from scientific articles. For the task of identification, we use the BANNER tool, a named entity recognition system, which is based on conditional random fields (CRF) and has obtained successful results in the biomedical domain. To classify keyphrases, we study the UMLS semantic network and propose a possible linking between the keyphrase types and the UMLS semantic groups. Based on this semantic linking, we create a dictionary for each keyphrase type. Then, a feature indicating if a token is found in one of these dictionaries is incorporated to feature set used by the BANNER tool. The final results on the test dataset show that our system still needs to be improved, but the conditional random fields and, consequently, the BANNER system can be used as a first approximation to identify and classify keyphrases.

pdf bib abs

LABDA at SemEval-2017 Task 10: Relation Classification between keyphrases via Convolutional Neural Network
Víctor Suárez-Paniagua | Isabel Segura-Bedmar | Paloma Martínez
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

In this paper, we describe our participation at the subtask of extraction of relationships between two identified keyphrases. This task can be very helpful in improving search engines for scientific articles. Our approach is based on the use of a convolutional neural network (CNN) trained on the training dataset. This deep learning model has already achieved successful results for the extraction relationships between named entities. Thus, our hypothesis is that this model can be also applied to extract relations between keyphrases. The official results of the task show that our architecture obtained an F1-score of 0.38% for Keyphrases Relation Classification. This performance is lower than the expected due to the generic preprocessing phase and the basic configuration of the CNN model, more complex architectures are proposed as future work to increase the classification rate.

2016

pdf bib

LABDA at the 2016 BioASQ challenge task 4a: Semantic Indexing by using ElasticSearch
Isabel Segura-Bedmar | Adrián Carruana | Paloma Martínez
Proceedings of the Fourth BioASQ workshop

2015

pdf bib

Exploring Word Embedding for Drug Name Recognition
Isabel Segura-Bedmar | Víctor Suárez-Paniagua | Paloma Martínez
Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis

2014

pdf bib

Extracting drug indications and adverse drug reactions from Spanish health social media
Isabel Segura-Bedmar | Santiago de la Peña González | Paloma Martínez
Proceedings of BioNLP 2014

pdf bib

Detecting drugs and adverse events from Spanish social media streams
Isabel Segura-Bedmar | Ricardo Revert | Paloma Martínez
Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)

Dating of contents is relevant to multiple advanced Natural Language Processing (NLP) applications, such as Information Retrieval or Question Answering. These could be improved by using techniques that consider a temporal dimension in their processes. To achieve it, an accurate detection of temporal expressions in data sources must be firstly done, dealing with them in an appropriated standard format that captures the time value of the expressions once resolved, and allows reasoning without ambiguity, in order to increase the range of search and the quality of the results to be returned. These tasks are completely necessary for NLP applications if an efficient temporal reasoning is afterwards expected. This work presents a typology of time expressions based on an empirical inductive approach, both from a structural perspective and from the point of view of their resolution. Furthermore, a method for the automatic recognition and resolution of temporal expressions in Spanish contents is provided, obtaining promising results when it is tested by means of an evaluation corpus.

pdf bib

A preliminary approach to extract drugs by combining UMLS resources and USAN naming conventions
Isabel Segura-Bedmar | Paloma Martínez | Doaa Samy
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing