A Dataset for Open Event Extraction in English

Kiem-Hieu Nguyen; Xavier Tannier; Olivier Ferret; Romaric Besançon

A Dataset for Open Event Extraction in English

Kiem-Hieu Nguyen, Xavier Tannier, Olivier Ferret, Romaric Besançon

Abstract

This article presents a corpus for development and testing of event schema induction systems in English. Schema induction is the task of learning templates with no supervision from unlabeled texts, and to group together entities corresponding to the same role in a template. Most of the previous work on this subject relies on the MUC-4 corpus. We describe the limits of using this corpus (size, non-representativeness, similarity of roles across templates) and propose a new, partially-annotated corpus in English which remedies some of these shortcomings. We make use of Wikinews to select the data inside the category Laws & Justice, and query Google search engine to retrieve different documents on the same events. Only Wikinews documents are manually annotated and can be used for evaluation, while the others can be used for unsupervised learning. We detail the methodology used for building the corpus and evaluate some existing systems on this new data.

Anthology ID:: L16-1307
Volume:: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:: May
Year:: 2016
Address:: Portorož, Slovenia
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 1939–1943
Language:
URL:: https://aclanthology.org/L16-1307/
DOI:
Bibkey:
Cite (ACL):: Kiem-Hieu Nguyen, Xavier Tannier, Olivier Ferret, and Romaric Besançon. 2016. A Dataset for Open Event Extraction in English. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1939–1943, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):: A Dataset for Open Event Extraction in English (Nguyen et al., LREC 2016)
Copy Citation:
PDF:: https://aclanthology.org/L16-1307.pdf

PDF Cite Search Fix data