Named Entities in Medical Case Reports: Corpus and Experiments

Sarah Schulz, Jurica Ševa, Samuel Rodriguez, Malte Ostendorff, Georg Rehm


Abstract
We present a new corpus comprising annotations of medical entities in case reports, originating from PubMed Central’s open access library. In the case reports, we annotate cases, conditions, findings, factors and negation modifiers. Moreover, where applicable, we annotate relations between these entities. As such, this is the first corpus of this kind made available to the scientific community in English. It enables the initial investigation of automatic information extraction from case reports through tasks like Named Entity Recognition, Relation Extraction and (sentence/paragraph) relevance detection. Additionally, we present four strong baseline systems for the detection of medical entities made available through the annotated dataset.
Anthology ID:
2020.lrec-1.553
Volume:
Proceedings of the 12th Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4495–4500
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.553
DOI:
Bibkey:
Cite (ACL):
Sarah Schulz, Jurica Ševa, Samuel Rodriguez, Malte Ostendorff, and Georg Rehm. 2020. Named Entities in Medical Case Reports: Corpus and Experiments. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 4495–4500, Marseille, France. European Language Resources Association.
Cite (Informal):
Named Entities in Medical Case Reports: Corpus and Experiments (Schulz et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.553.pdf
Data
Medical Case Report CorpusBC5CDR