WN-Salience: A Corpus of News Articles with Entity Salience Annotations

Chuan Wu, Evangelos Kanoulas, Maarten de Rijke, Wei Lu


Abstract
Entities can be found in various text genres, ranging from tweets and web pages to user queries submitted to web search engines. Existing research either considers all entities in the text equally important, or heuristics are used to measure their salience. We believe that a key reason for the relatively limited work on entity salience is the lack of appropriate datasets. To support research on entity salience, we present a new dataset, the WikiNews Salience dataset (WN-Salience), which can be used to benchmark tasks such as entity salience detection and salient entity linking. WN-Salience is built on top of Wikinews, a Wikimedia project whose mission is to present reliable news articles. Entities in Wikinews articles are identified by the authors of the articles and are linked to Wikinews categories when they are salient or to Wikipedia pages otherwise. The dataset is built automatically, and consists of approximately 7,000 news articles, and 90,000 in-text entity annotations. We compare the WN-Salience dataset against existing datasets on the task and analyze their differences. Furthermore, we conduct experiments on entity salience detection; the results demonstrate that WN-Salience is a challenging testbed that is complementary to existing ones.
Anthology ID:
2020.lrec-1.257
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
2095–2102
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.257
DOI:
Bibkey:
Cite (ACL):
Chuan Wu, Evangelos Kanoulas, Maarten de Rijke, and Wei Lu. 2020. WN-Salience: A Corpus of News Articles with Entity Salience Annotations. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 2095–2102, Marseille, France. European Language Resources Association.
Cite (Informal):
WN-Salience: A Corpus of News Articles with Entity Salience Annotations (Wu et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.257.pdf