Salvador Lima-López

Also published as: Salvador Lima Lopez, Salvador Lima Lopez, Salvador Lima López, Salvador Lima-lopez


2022

pdf bib
Findings of the WMT 2022 Biomedical Translation Shared Task: Monolingual Clinical Case Reports
Mariana Neves | Antonio Jimeno Yepes | Amy Siu | Roland Roller | Philippe Thomas | Maika Vicente Navarro | Lana Yeganova | Dina Wiemann | Giorgio Maria Di Nunzio | Federica Vezzani | Christel Gerardin | Rachel Bawden | Darryl Johan Estrada | Salvador Lima-lopez | Eulalia Farre-maduel | Martin Krallinger | Cristian Grozea | Aurelie Neveol
Proceedings of the Seventh Conference on Machine Translation (WMT)

In the seventh edition of the WMT Biomedical Task, we addressed a total of seven languagepairs, namely English/German, English/French, English/Spanish, English/Portuguese, English/Chinese, English/Russian, English/Italian. This year’s test sets covered three types of biomedical text genre. In addition to scientific abstracts and terminology items used in previous editions, we released test sets of clinical cases. The evaluation of clinical cases translations were given special attention by involving clinicians in the preparation of reference translations and manual evaluation. For the main MEDLINE test sets, we received a total of 609 submissions from 37 teams. For the ClinSpEn sub-task, we had the participation of five teams.

pdf bib
The SocialDisNER shared task on detection of disease mentions in health-relevant content from social media: methods, evaluation, guidelines and corpora
Luis Gasco Sánchez | Darryl Estrada Zavala | Eulàlia Farré-Maduell | Salvador Lima-López | Antonio Miranda-Escalada | Martin Krallinger
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

There is a pressing need to exploit health-related content from social media, a global source of data where key health information is posted directly by citizens, patients and other healthcare stakeholders. Use cases of disease related social media mining include disease outbreak/surveillance, mental health and pharmacovigilance. Current efforts address the exploitation of social media beyond English. The SocialDisNER task, organized as part of the SMM4H 2022 initiative, has applied the LINKAGE methodology to select and annotate a Gold Standard corpus of 9,500 tweets in Spanish enriched with disease mentions generated by patients and medical professionals. As a complementary resource for teams participating in the SocialDisNER track, we have also created a large-scale corpus of 85,000 tweets, where in addition to disease mentions, other medical entities of relevance (e.g., medications, symptoms and procedures, among others) have been automatically labelled. Using these large-scale datasets, co-mention networks or knowledge graphs were released for each entity pair type. Out of the 47 teams registered for the task, 17 teams uploaded a total of 32 runs. The top-performing team achieved a very competitive 0.891 f-score, with a system trained following a continue pre-training strategy. We anticipate that the corpus and systems resulting from the SocialDisNER track might further foster health related text mining of social media content in Spanish and inspire disease detection strategies in other languages.

2021

pdf bib
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task
Arjun Magge | Ari Klein | Antonio Miranda-Escalada | Mohammed Ali Al-garadi | Ilseyar Alimova | Zulfat Miftahutdinov | Eulalia Farre-Maduell | Salvador Lima Lopez | Ivan Flores | Karen O'Connor | Davy Weissenbacher | Elena Tutubalina | Abeed Sarker | Juan M Banda | Martin Krallinger | Graciela Gonzalez-Hernandez
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

pdf bib
The ProfNER shared task on automatic recognition of occupation mentions in social media: systems, evaluation, guidelines, embeddings and corpora
Antonio Miranda-Escalada | Eulàlia Farré-Maduell | Salvador Lima-López | Luis Gascó | Vicent Briva-Iglesias | Marvin Agüero-Torales | Martin Krallinger
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

Detection of occupations in texts is relevant for a range of important application scenarios, like competitive intelligence, sociodemographic analysis, legal NLP or health-related occupational data mining. Despite the importance and heterogeneous data types that mention occupations, text mining efforts to recognize them have been limited. This is due to the lack of clear annotation guidelines and high-quality Gold Standard corpora. Social media data can be regarded as a relevant source of information for real-time monitoring of at-risk occupational groups in the context of pandemics like the COVID-19 one, facilitating intervention strategies for occupations in direct contact with infectious agents or affected by mental health issues. To evaluate current NLP methods and to generate resources, we have organized the ProfNER track at SMM4H 2021, providing ProfNER participants with a Gold Standard corpus of manually annotated tweets (human IAA of 0.919) following annotation guidelines available in Spanish and English, an occupation gazetteer, a machine-translated version of tweets, and FastText embeddings. Out of 35 registered teams, 11 submitted a total of 27 runs. Best-performing participants built systems based on recent NLP technologies (e.g. transformers) and achieved 0.93 F-score in Text Classification and 0.839 in Named Entity Recognition. Corpus: https://doi.org/10.5281/zenodo.4309356

pdf bib
Overview of the Sixth Social Media Mining for Health Applications (#SMM4H) Shared Tasks at NAACL 2021
Arjun Magge | Ari Klein | Antonio Miranda-Escalada | Mohammed Ali Al-Garadi | Ilseyar Alimova | Zulfat Miftahutdinov | Eulalia Farre | Salvador Lima López | Ivan Flores | Karen O’Connor | Davy Weissenbacher | Elena Tutubalina | Abeed Sarker | Juan Banda | Martin Krallinger | Graciela Gonzalez-Hernandez
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

The global growth of social media usage over the past decade has opened research avenues for mining health related information that can ultimately be used to improve public health. The Social Media Mining for Health Applications (#SMM4H) shared tasks in its sixth iteration sought to advance the use of social media texts such as Twitter for pharmacovigilance, disease tracking and patient centered outcomes. #SMM4H 2021 hosted a total of eight tasks that included reruns of adverse drug effect extraction in English and Russian and newer tasks such as detecting medication non-adherence from Twitter and WebMD forum, detecting self-reported adverse pregnancy outcomes, detecting cases and symptoms of COVID-19, identifying occupations mentioned in Spanish by Twitter users, and detecting self-reported breast cancer diagnosis. The eight tasks included a total of 12 individual subtasks spanning three languages requiring methods for binary classification, multi-class classification, named entity recognition and entity normalization. With a total of 97 registering teams and 40 teams submitting predictions, the interest in the shared tasks grew by 70% and participation grew by 38% compared to the previous iteration.

2020

pdf bib
NUBes: A Corpus of Negation and Uncertainty in Spanish Clinical Texts
Salvador Lima Lopez | Naiara Perez | Montse Cuadros | German Rigau
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper introduces the first version of the NUBes corpus (Negation and Uncertainty annotations in Biomedical texts in Spanish). The corpus is part of an on-going research and currently consists of 29,682 sentences obtained from anonymised health records annotated with negation and uncertainty. The article includes an exhaustive comparison with similar corpora in Spanish, and presents the main annotation and design decisions. Additionally, we perform preliminary experiments using deep learning algorithms to validate the annotated dataset. As far as we know, NUBes is the largest available corpora for negation in Spanish and the first that also incorporates the annotation of speculation cues, scopes, and events.

pdf bib
HitzalMed: Anonymisation of Clinical Text in Spanish
Salvador Lima Lopez | Naiara Perez | Laura García-Sardiña | Montse Cuadros
Proceedings of the Twelfth Language Resources and Evaluation Conference

HitzalMed is a web-framed tool that performs automatic detection of sensitive information in clinical texts using machine learning algorithms reported to be competitive for the task. Moreover, once sensitive information is detected, different anonymisation techniques are implemented that are configurable by the user –for instance, substitution, where sensitive items are replaced by same category text in an effort to generate a new document that looks as natural as the original one. The tool is able to get data from different document formats and outputs downloadable anonymised data. This paper presents the anonymisation and substitution technology and the demonstrator which is publicly available at https://snlt.vicomtech.org/hitzalmed.