César Teixeira


2019

pdf bib
Contributions to Clinical Named Entity Recognition in Portuguese
Fábio Lopes | César Teixeira | Hugo Gonçalo Oliveira
Proceedings of the 18th BioNLP Workshop and Shared Task

Having in mind that different languages might present different challenges, this paper presents the following contributions to the area of Information Extraction from clinical text, targeting the Portuguese language: a collection of 281 clinical texts in this language, with manually-annotated named entities; word embeddings trained in a larger collection of similar texts; results of using BiLSTM-CRF neural networks for named entity recognition on the annotated collection, including a comparison of using in-domain or out-of-domain word embeddings in this task. Although learned with much less data, performance is higher when using in-domain embeddings. When tested in 20 independent clinical texts, this model achieved better results than a model using larger out-of-domain embeddings.