Michael Poprat


2008

pdf bib
Semantic Annotations for Biology: a Corpus Development Initiative at the Jena University Language & Information Engineering (JULIE) Lab
Udo Hahn | Elena Beisswanger | Ekaterina Buyko | Michael Poprat | Katrin Tomanek | Joachim Wermter
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We provide an overview of corpus building efforts at the Jena University Language & Information Engineering (JULIE) Lab which are focused on life science documents. Special emphasis is laid on semantic annotations in terms of a large amount of biomedical named entities (almost 100 entity types), semantic relations, as well as discourse phenomena, reference relations in particular.

pdf bib
Building a BioWordNet Using WordNet Data Structures and WordNet’s Software Infrastructure–A Failure Story
Michael Poprat | Elena Beisswanger | Udo Hahn
Software Engineering, Testing, and Quality Assurance for Natural Language Processing

2007

pdf bib
Quantitative Data on Referring Expressions in Biomedical Abstracts
Michael Poprat | Udo Hahn
Biological, translational, and clinical language processing

2006

pdf bib
Language Specific and Topic Focused Web Crawling
Olena Medelyan | Stefan Schulz | Jan Paetzold | Michael Poprat | Kornél Markó
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We describe an experiment on collecting large language and topic specific corpora automatically by using a focused Web crawler. Our crawler combines efficient crawling techniques with a common text classification tool. Given a sample corpus of medical documents, we automatically extract query phrases and then acquire seed URLs with a standard search engine. Starting from these seed URLs, the crawler builds a new large collection consisting only of documents that satisfy both the language and the topic model. The manual analysis of acquired English and German medicine corpora reveals the high accuracy of the crawler. However, there are significant differences between both languages.