Crowdsourced Corpus with Entity Salience Annotations

Milan Dojchinovski, Dinesh Reddy, Tomáš Kliegr, Tomáš Vitvar, Harald Sack


Abstract
In this paper, we present a crowdsourced dataset which adds entity salience (importance) annotations to the Reuters-128 dataset, which is subset of Reuters-21578. The dataset is distributed under a free license and publish in the NLP Interchange Format, which fosters interoperability and re-use. We show the potential of the dataset on the task of learning an entity salience classifier and report on the results from several experiments.
Anthology ID:
L16-1527
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3307–3311
Language:
URL:
https://aclanthology.org/L16-1527
DOI:
Bibkey:
Cite (ACL):
Milan Dojchinovski, Dinesh Reddy, Tomáš Kliegr, Tomáš Vitvar, and Harald Sack. 2016. Crowdsourced Corpus with Entity Salience Annotations. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3307–3311, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Crowdsourced Corpus with Entity Salience Annotations (Dojchinovski et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1527.pdf