Contributions to Clinical Named Entity Recognition in Portuguese

Fábio Lopes, César Teixeira, Hugo Gonçalo Oliveira


Abstract
Having in mind that different languages might present different challenges, this paper presents the following contributions to the area of Information Extraction from clinical text, targeting the Portuguese language: a collection of 281 clinical texts in this language, with manually-annotated named entities; word embeddings trained in a larger collection of similar texts; results of using BiLSTM-CRF neural networks for named entity recognition on the annotated collection, including a comparison of using in-domain or out-of-domain word embeddings in this task. Although learned with much less data, performance is higher when using in-domain embeddings. When tested in 20 independent clinical texts, this model achieved better results than a model using larger out-of-domain embeddings.
Anthology ID:
W19-5024
Volume:
Proceedings of the 18th BioNLP Workshop and Shared Task
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Venue:
BioNLP
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
223–233
Language:
URL:
https://aclanthology.org/W19-5024
DOI:
10.18653/v1/W19-5024
Bibkey:
Cite (ACL):
Fábio Lopes, César Teixeira, and Hugo Gonçalo Oliveira. 2019. Contributions to Clinical Named Entity Recognition in Portuguese. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 223–233, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Contributions to Clinical Named Entity Recognition in Portuguese (Lopes et al., BioNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-5024.pdf
Code
 fabioacl/PortugueseClinicalNER