Jurica Seva

Also published as: Jurica Ševa

2020

pdf bib abs
Named Entities in Medical Case Reports: Corpus and Experiments
Sarah Schulz | Jurica Ševa | Samuel Rodriguez | Malte Ostendorff | Georg Rehm
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present a new corpus comprising annotations of medical entities in case reports, originating from PubMed Central’s open access library. In the case reports, we annotate cases, conditions, findings, factors and negation modifiers. Moreover, where applicable, we annotate relations between these entities. As such, this is the first corpus of this kind made available to the scientific community in English. It enables the initial investigation of automatic information extraction from case reports through tasks like Named Entity Recognition, Relation Extraction and (sentence/paragraph) relevance detection. Additionally, we present four strong baseline systems for the detection of medical entities made available through the annotated dataset.

2018

pdf bib abs
Identifying Key Sentences for Precision Oncology Using Semi-Supervised Learning
Jurica Ševa | Martin Wackerbauer | Ulf Leser
Proceedings of the BioNLP 2018 workshop

We present a machine learning pipeline that identifies key sentences in abstracts of oncological articles to aid evidence-based medicine. This problem is characterized by the lack of gold standard datasets, data imbalance and thematic differences between available silver standard corpora. Additionally, available training and target data differs with regard to their domain (professional summaries vs. sentences in abstracts). This makes supervised machine learning inapplicable. We propose the use of two semi-supervised machine learning approaches: To mitigate difficulties arising from heterogeneous data sources, overcome data imbalance and create reliable training data we propose using transductive learning from positive and unlabelled data (PU Learning). For obtaining a realistic classification model, we propose the use of abstracts summarised in relevant sentences as unlabelled examples through Self-Training. The best model achieves 84% accuracy and 0.84 F1 score on our dataset

2016

Co-authors

Jurica Seva

2020

2018

2016

Co-authors

Venues