Crowdsourcing a Large Corpus of Clickbait on Twitter

Martin Potthast, Tim Gollub, Kristof Komlossy, Sebastian Schuster, Matti Wiegmann, Erika Patricia Garces Fernandez, Matthias Hagen, Benno Stein


Abstract
Clickbait has become a nuisance on social media. To address the urging task of clickbait detection, we constructed a new corpus of 38,517 annotated Twitter tweets, the Webis Clickbait Corpus 2017. To avoid biases in terms of publisher and topic, tweets were sampled from the top 27 most retweeted news publishers, covering a period of 150 days. Each tweet has been annotated on 4-point scale by five annotators recruited at Amazon’s Mechanical Turk. The corpus has been employed to evaluate 12 clickbait detectors submitted to the Clickbait Challenge 2017. Download: https://webis.de/data/webis-clickbait-17.html Challenge: https://clickbait-challenge.org
Anthology ID:
C18-1127
Volume:
Proceedings of the 27th International Conference on Computational Linguistics
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1498–1507
Language:
URL:
https://aclanthology.org/C18-1127
DOI:
Bibkey:
Cite (ACL):
Martin Potthast, Tim Gollub, Kristof Komlossy, Sebastian Schuster, Matti Wiegmann, Erika Patricia Garces Fernandez, Matthias Hagen, and Benno Stein. 2018. Crowdsourcing a Large Corpus of Clickbait on Twitter. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1498–1507, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Crowdsourcing a Large Corpus of Clickbait on Twitter (Potthast et al., COLING 2018)
Copy Citation:
PDF:
https://aclanthology.org/C18-1127.pdf