Sara Mamprin


pdf bib
Information structure in the Potsdam Commentary Corpus: Topics
Manfred Stede | Sara Mamprin
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The Potsdam Commentary Corpus is a collection of 175 German newspaper commentaries annotated on a variety of different layers. This paper introduces a new layer that covers the linguistic notion of information-structural topic (not to be confused with ‘topic’ as applied to documents in information retrieval). To our knowledge, this is the first larger topic-annotated resource for German (and one of the first for any language). We describe the annotation guidelines and the annotation process, and the results of an inter-annotator agreement study, which compare favourably to the related work. The annotated corpus is freely available for research.