Information structure in the Potsdam Commentary Corpus: Topics

Manfred Stede, Sara Mamprin


Abstract
The Potsdam Commentary Corpus is a collection of 175 German newspaper commentaries annotated on a variety of different layers. This paper introduces a new layer that covers the linguistic notion of information-structural topic (not to be confused with ‘topic’ as applied to documents in information retrieval). To our knowledge, this is the first larger topic-annotated resource for German (and one of the first for any language). We describe the annotation guidelines and the annotation process, and the results of an inter-annotator agreement study, which compare favourably to the related work. The annotated corpus is freely available for research.
Anthology ID:
L16-1271
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1718–1723
Language:
URL:
https://aclanthology.org/L16-1271
DOI:
Bibkey:
Cite (ACL):
Manfred Stede and Sara Mamprin. 2016. Information structure in the Potsdam Commentary Corpus: Topics. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1718–1723, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Information structure in the Potsdam Commentary Corpus: Topics (Stede & Mamprin, LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1271.pdf