WikiCoref: An English Coreference-annotated Corpus of Wikipedia Articles

Abbas Ghaddar, Phillippe Langlais


Abstract
This paper presents WikiCoref, an English corpus annotated for anaphoric relations, where all documents are from the English version of Wikipedia. Our annotation scheme follows the one of OntoNotes with a few disparities. We annotated each markable with coreference type, mention type and the equivalent Freebase topic. Since most similar annotation efforts concentrate on very specific types of written text, mainly newswire, there is a lack of resources for otherwise over-used Wikipedia texts. The corpus described in this paper addresses this issue. We present a freely available resource we initially devised for improving coreference resolution algorithms dedicated to Wikipedia texts. Our corpus has no restriction on the topics of the documents being annotated, and documents of various sizes have been considered for annotation.
Anthology ID:
L16-1021
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
136–142
Language:
URL:
https://aclanthology.org/L16-1021
DOI:
Bibkey:
Cite (ACL):
Abbas Ghaddar and Phillippe Langlais. 2016. WikiCoref: An English Coreference-annotated Corpus of Wikipedia Articles. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 136–142, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
WikiCoref: An English Coreference-annotated Corpus of Wikipedia Articles (Ghaddar & Langlais, LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1021.pdf
Data
WikiCoref