SciCorp: A Corpus of English Scientific Articles Annotated for Information Status Analysis

Ina Roesiger

SciCorp: A Corpus of English Scientific Articles Annotated for Information Status Analysis

Abstract

This paper presents SciCorp, a corpus of full-text English scientific papers of two disciplines, genetics and computational linguistics. The corpus comprises co-reference and bridging information as well as information status labels. Since SciCorp is annotated with both labels and the respective co-referent and bridging links, we believe it is a valuable resource for NLP researchers working on scientific articles or on applications such as co-reference resolution, bridging resolution or information status classification. The corpus has been reliably annotated by independent human coders with moderate inter-annotator agreement (average kappa = 0.71). In total, we have annotated 14 full papers containing 61,045 tokens and marked 8,708 definite noun phrases. The paper describes in detail the annotation scheme as well as the resulting corpus. The corpus is available for download in two different formats: in an offset-based format and for the co-reference annotations in the widely-used, tabular CoNLL-2012 format.

Anthology ID:: L16-1275
Volume:: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:: May
Year:: 2016
Address:: Portorož, Slovenia
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 1743–1749
Language:
URL:: https://aclanthology.org/L16-1275/
DOI:
Bibkey:
Cite (ACL):: Ina Roesiger. 2016. SciCorp: A Corpus of English Scientific Articles Annotated for Information Status Analysis. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1743–1749, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):: SciCorp: A Corpus of English Scientific Articles Annotated for Information Status Analysis (Roesiger, LREC 2016)
Copy Citation:
PDF:: https://aclanthology.org/L16-1275.pdf

PDF Cite Search Fix data