Will-They-Won’t-They: A Very Large Dataset for Stance Detection on Twitter

Costanza Conforti, Jakob Berndt, Mohammad Taher Pilehvar, Chryssi Giannitsarou, Flavio Toxvaerd, Nigel Collier


Abstract
We present a new challenging stance detection dataset, called Will-They-Won’t-They (WT–WT), which contains 51,284 tweets in English, making it by far the largest available dataset of the type. All the annotations are carried out by experts; therefore, the dataset constitutes a high-quality and reliable benchmark for future research in stance detection. Our experiments with a wide range of recent state-of-the-art stance detection systems show that the dataset poses a strong challenge to existing models in this domain.
Anthology ID:
2020.acl-main.157
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Editors:
Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1715–1724
Language:
URL:
https://aclanthology.org/2020.acl-main.157
DOI:
10.18653/v1/2020.acl-main.157
Bibkey:
Cite (ACL):
Costanza Conforti, Jakob Berndt, Mohammad Taher Pilehvar, Chryssi Giannitsarou, Flavio Toxvaerd, and Nigel Collier. 2020. Will-They-Won’t-They: A Very Large Dataset for Stance Detection on Twitter. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1715–1724, Online. Association for Computational Linguistics.
Cite (Informal):
Will-They-Won’t-They: A Very Large Dataset for Stance Detection on Twitter (Conforti et al., ACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.157.pdf
Video:
 http://slideslive.com/38929075
Code
 cambridge-wtwt/acl2020-wtwt-tweets +  additional community code
Data
WT-WT