Bootstrapping a Romanian Corpus for Medical Named Entity Recognition

Maria Mitrofan


Abstract
Named Entity Recognition (NER) is an important component of natural language processing (NLP), with applicability in biomedical domain, enabling knowledge-discovery from medical texts. Due to the fact that for the Romanian language there are only a few linguistic resources specific to the biomedical domain, it was created a sub-corpus specific to this domain. In this paper we present a newly developed Romanian sub-corpus for medical-domain NER, which is a valuable asset for the field of biomedical text processing. We provide a description of the sub-corpus, informative statistics about data-composition and we evaluate an automatic NER tool on the newly created resource.
Anthology ID:
R17-1066
Volume:
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
Month:
September
Year:
2017
Address:
Varna, Bulgaria
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
501–509
Language:
URL:
https://doi.org/10.26615/978-954-452-049-6_066
DOI:
10.26615/978-954-452-049-6_066
Bibkey:
Copy Citation:
PDF:
https://doi.org/10.26615/978-954-452-049-6_066