MoNERo: a Biomedical Gold Standard Corpus for the Romanian Language

Maria Mitrofan, Verginica Barbu Mititelu, Grigorina Mitrofan


Abstract
In an era when large amounts of data are generated daily in various fields, the biomedical field among others, linguistic resources can be exploited for various tasks of Natural Language Processing. Moreover, increasing number of biomedical documents are available in languages other than English. To be able to extract information from natural language free text resources, methods and tools are needed for a variety of languages. This paper presents the creation of the MoNERo corpus, a gold standard biomedical corpus for Romanian, annotated with both part of speech tags and named entities. MoNERo comprises 154,825 morphologically annotated tokens and 23,188 entity annotations belonging to four entity semantic groups corresponding to UMLS Semantic Groups.
Anthology ID:
W19-5008
Volume:
Proceedings of the 18th BioNLP Workshop and Shared Task
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Venue:
BioNLP
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
71–79
Language:
URL:
https://aclanthology.org/W19-5008
DOI:
10.18653/v1/W19-5008
Bibkey:
Cite (ACL):
Maria Mitrofan, Verginica Barbu Mititelu, and Grigorina Mitrofan. 2019. MoNERo: a Biomedical Gold Standard Corpus for the Romanian Language. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 71–79, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
MoNERo: a Biomedical Gold Standard Corpus for the Romanian Language (Mitrofan et al., BioNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-5008.pdf