Crowdsourcing Salient Information from News and Tweets

Oana Inel, Tommaso Caselli, Lora Aroyo


Abstract
The increasing streams of information pose challenges to both humans and machines. On the one hand, humans need to identify relevant information and consume only the information that lies at their interests. On the other hand, machines need to understand the information that is published in online data streams and generate concise and meaningful overviews. We consider events as prime factors to query for information and generate meaningful context. The focus of this paper is to acquire empirical insights for identifying salience features in tweets and news about a target event, i.e., the event of “whaling”. We first derive a methodology to identify such features by building up a knowledge space of the event enriched with relevant phrases, sentiments and ranked by their novelty. We applied this methodology on tweets and we have performed preliminary work towards adapting it to news articles. Our results show that crowdsourcing text relevance, sentiments and novelty (1) can be a main step in identifying salient information, and (2) provides a deeper and more precise understanding of the data at hand compared to state-of-the-art approaches.
Anthology ID:
L16-1625
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3959–3966
Language:
URL:
https://aclanthology.org/L16-1625
DOI:
Bibkey:
Cite (ACL):
Oana Inel, Tommaso Caselli, and Lora Aroyo. 2016. Crowdsourcing Salient Information from News and Tweets. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3959–3966, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Crowdsourcing Salient Information from News and Tweets (Inel et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1625.pdf