Alicia Pérez

Also published as: Alicia Perez

2024

pdf bib abs
Ixa-Med at Discharge Me! Retrieval-Assisted Generation for Streamlining Discharge Documentation
Jordan C. Koontz | Maite Oronoz | Alicia Pérez
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing

In this paper we present our system for the BioNLP ACL’24 “Discharge Me!” task on automating discharge summary section generation. Using Retrieval-Augmented Generation, we combine a Large Language Model (LLM) with external knowledge to guide the generation of the target sections. Our approach generates structured patient summaries from discharge notes using an instructed LLM, retrieves relevant “Brief Hospital Course” and “Discharge Instructions” examples via BM25 and SentenceBERT, and provides this context to a frozen LLM for generation. Our top system using SentenceBERT retrieval achieves an overall score of 0.183, outperforming zero-shot baselines. We analyze performance across different aspects, discussing limitations and future research directions.

2023

pdf bib abs
Evaluating Data Augmentation for Medication Identification in Clinical Notes
Jordan Koontz | Maite Oronoz | Alicia Pérez
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

We evaluate the effectiveness of using data augmentation to improve the generalizability of a Named Entity Recognition model for the task of medication identification in clinical notes. We compare disparate data augmentation methods, namely mention-replacement and a generative model, for creating synthetic training examples. Through experiments on the n2c2 2022 Track 1 Contextualized Medication Event Extraction data set, we show that data augmentation with supplemental examples created with GPT-3 can boost the performance of a transformer-based model for small training sets.

2022

pdf bib abs
Approximate Nearest Neighbour Extraction Techniques and Neural Networks for Suicide Risk Prediction in the CLPsych 2022 Shared Task
Hermenegildo Fabregat Marcos | Ander Cejudo | Juan Martinez-romo | Alicia Perez | Lourdes Araujo | Nuria Lebea | Maite Oronoz | Arantza Casillas
Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology

This paper describes the participation of our group on the CLPsych 2022 shared task. For task A, which tries to capture changes in mood over time, we have applied an Approximate Nearest Neighbour (ANN) extraction technique with the aim of relabelling the user messages according to their proximity, based on the representation of these messages in a vector space. Regarding the subtask B, we have used the output of the subtask A to train a Recurrent Neural Network (RNN) to predict the risk of suicide at the user level. The results obtained are very competitive considering that our team was one of the few that made use of the organisers’ proposed virtual environment and also made use of the Task A output to predict the Task B results.

2021

pdf bib abs
On the Contribution of Per-ICD Attention Mechanisms to Classify Health Records in Languages with Fewer Resources than English
Alberto Blanco | Sonja Remmer | Alicia Pérez | Hercules Dalianis | Arantza Casillas
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

We introduce a multi-label text classifier with per-label attention for the classification of Electronic Health Records according to the International Classification of Diseases. We apply the model on two Electronic Health Records datasets with Discharge Summaries in two languages with fewer resources than English, Spanish and Swedish. Our model leverages the BERT Multilingual model (specifically the Wikipedia, as the model have been trained with 104 languages, including Spanish and Swedish, with the largest Wikipedia dumps) to share the language modelling capabilities across the languages. With the per-label attention, the model can compute the relevance of each word from the EHR towards the prediction of each label. For the experimental framework, we apply 157 labels from Chapter XI – Diseases of the Digestive System of the ICD, which makes the attention especially important as the model has to discriminate between similar diseases. 1 https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages

2016

pdf bib abs
The impact of simple feature engineering in multilingual medical NER
Rebecka Weegar | Arantza Casillas | Arantza Diaz de Ilarraza | Maite Oronoz | Alicia Pérez | Koldo Gojenola
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)

The goal of this paper is to examine the impact of simple feature engineering mechanisms before applying more sophisticated techniques to the task of medical NER. Sometimes papers using scientifically sound techniques present raw baselines that could be improved adding simple and cheap features. This work focuses on entity recognition for the clinical domain for three languages: English, Swedish and Spanish. The task is tackled using simple features, starting from the window size, capitalization, prefixes, and moving to POS and semantic tags. This work demonstrates that a simple initial step of feature engineering can improve the baseline results significantly. Hence, the contributions of this paper are: first, a short list of guidelines well supported with experimental results on three languages and, second, a detailed description of the relevance of these features for medical NER.

pdf bib abs
Fully unsupervised low-dimensional representation of adverse drug reaction events through distributional semantics
Alicia Pérez | Arantza Casillas | Koldo Gojenola
Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016)

Electronic health records show great variability since the same concept is often expressed with different terms, either scientific latin forms, common or lay variants and even vernacular naming. Deep learning enables distributional representation of terms in a vector-space, and therefore, related terms tend to be close in the vector space. Accordingly, embedding words through these vectors opens the way towards accounting for semantic relatedness through classical algebraic operations. In this work we propose a simple though efficient unsupervised characterization of Adverse Drug Reactions (ADRs). This approach exploits the embedding representation of the terms involved in candidate ADR events, that is, drug-disease entity pairs. In brief, the ADRs are represented as vectors that link the drug with the disease in their context through a recursive additive model. We discovered that a low-dimensional representation that makes use of the modulus and argument of the embedded representation of the ADR event shows correlation with the manually annotated class. Thus, it can be derived that this characterization results in to be beneficial for further classification tasks as predictive features.

The goal of this work is to improve current translation models by taking into account additional knowledge sources such as semantically motivated segmentation or statistical categorization. Specifically, two different approaches are discussed. On the one hand, phrase-based approach, and on the other hand, categorization. For both approaches, both statistical and linguistic alternatives are explored. As for translation framework, finite-state transducers are considered. These are versatile models that can be easily integrated on-the-fly with acoustic models for speech translation purposes. In what the experimental framework concerns, all the models presented were evaluated and compared taking confidence intervals into account.

pdf bib
An Integrated Architecture for Speech-Input Multi-Target Machine Translation
Alicia Pérez | M. Teresa González | M. Inés Torres | Francisco Casacuberta
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

pdf bib
Speech-Input Multi-Target Machine Translation
Alicia Pérez | M. Teresa González | M. Inés Torres | Francisco Casacuberta
Proceedings of the Second Workshop on Statistical Machine Translation