Claudia Moro

2026

NormaTex-MapSNOMED: Bridging the Gap Between Brazilian Portuguese Clinical Narratives and SNOMED CT
Isabela Araujo | Claudia Moro | Layslla Martinez
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1

Clinical narratives written in free text contain valuable information for patient care. However, their unstructured nature and linguistic variability pose significant challenges for automatic processing and interoperability. In particular, mapping clinical terms to standardized terminologies such as SNOMED Clinical Terms (SNOMED CT) remains difficult for languages other than English, including Brazilian Portuguese. This paper presents NormaTex-MapSNOMED, a proposed component of the NormaTex framework that focuses on mapping clinical terms to predefined categories aligned with SNOMED CT. Given previously extracted terms, the method leverages large language models (LLMs) guided by a structured prompt to assign terms to target categories. Experiments were conducted on Portuguese-language clinical narratives and evaluated using three complementary strategies: lexical similarity based on Levenshtein distance, contextual similarity using a BERT-based model, and semantic validation using LLMs. The results show that LLM-based evaluation consistently outperforms lexical and contextual baselines across different models, with higher precision observed for disease-related terms compared to symptom-related expressions. These findings indicate that LLMs are a promising approach for semantic mapping of clinical terms in Brazilian Portuguese and can support clinical term normalization and interoperability with standardized terminologies.

2022

pdf bib abs

UC3M-PUCPR at SemEval-2022 Task 11: An Ensemble Method of Transformer-based Models for Complex Named Entity Recognition
Elisa Schneider | Renzo M. Rivera-Zavala | Paloma Martinez | Claudia Moro | Emerson Paraiso
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This study introduces the system submitted to the SemEval 2022 Task 11: MultiCoNER (Multilingual Complex Named Entity Recognition) by the UC3M-PUCPR team. We proposed an ensemble of transformer-based models for entity recognition in cross-domain texts. Our deep learning method benefits from the transformer architecture, which adopts the attention mechanism to handle the long-range dependencies of the input text. Also, the ensemble approach for named entity recognition (NER) improved the results over baselines based on individual models on two of the three tracks we participated in. The ensemble model for the code-mixed task achieves an overall performance of 76.36% F1-score, a 2.85 percentage point increase upon our individually best model for this task, XLM-RoBERTa-large (73.51%), outperforming the baseline provided for the shared task by 18.26 points. Our preliminary results suggest that contextualized language models ensembles can, even if modestly, improve the results in extracting information from unstructured data.

2020

pdf bib abs

Contextualized French Language Models for Biomedical Named Entity Recognition
Jenny Copara | Julien Knafou | Nona Naderi | Claudia Moro | Patrick Ruch | Douglas Teodoro
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Atelier DÉfi Fouille de Textes

Named entity recognition (NER) is key for biomedical applications as it allows knowledge discovery in free text data. As entities are semantic phrases, their meaning is conditioned to the context to avoid ambiguity. In this work, we explore contextualized language models for NER in French biomedical text as part of the Défi Fouille de Textes challenge. Our best approach achieved an F1 -measure of 66% for symptoms and signs, and pathology categories, being top 1 for subtask 1. For anatomy, dose, exam, mode, moment, substance, treatment, and value categories, it achieved an F1 -measure of 75% (subtask 2). If considered all categories, our model achieved the best result in the challenge, with an F1 -measure of 72%. The use of an ensemble of neural language models proved to be very effective, improving a CRF baseline by up to 28% and a single specialised language model by 4%.

2018

pdf bib abs

Portée de la négation : détection par apprentissage supervisé en français et portugais brésilien (Negation scope : sequence labeling by supervised learning in French and Brazilian-Portuguese)
Clément Dalloux | Vincent Claveau | Natalia Grabar | Claudia Moro
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

La détection automatique de la négation fait souvent partie des pré-requis dans les systèmes d’extraction d’information, notamment dans le domaine biomédical. Cet article présente nos contributions concernant la détection de la portée de la négation en français et portugais brésilien. Nous présentons d’une part deux corpus principalement constitués d’extraits de protocoles d’essais cliniques en français et portugais brésilien, dédiés aux critères d’inclusion de patients. Les marqueurs de négation et leurs portées y ont été annotés manuellement. Nous présentons d’autre part une approche par réseau de neurones récurrents pour extraire les portées.

Co-authors

Emerson Cabrera Paraiso 1

Renzo M. Rivera-Zavala 1

Patrick Ruch 1

Elisa Schneider 1

Douglas Teodoro 1

Venues

Fix author