Stanceosaurus: Classifying Stance Towards Multicultural Misinformation

Jonathan Zheng, Ashutosh Baheti, Tarek Naous, Wei Xu, Alan Ritter


Abstract
We present Stanceosaurus, a new corpus of 28,033 tweets in English, Hindi and Arabic annotated with stance towards 250 misinformation claims. As far as we are aware, it is the largest corpus annotated with stance towards misinformation claims. The claims in Stanceosaurus originate from 15 fact-checking sources that cover diverse geographical regions and cultures. Unlike existing stance datasets, we introduce a more fine-grained 5-class labeling strategy with additional subcategories to distinguish implicit stance. Pre-trained transformer-based stance classifiers that are fine-tuned on our corpus show good generalization on unseen claims and regional claims from countries outside the training data. Cross-lingual experiments demonstrate Stanceosaurus’ capability of training multilingual models, achieving 53.1 F1 on Hindi and 50.4 F1 on Arabic without any target-language fine-tuning. Finally, we show how a domain adaptation method can be used to improve performance on Stanceosaurus using additional RumourEval-2019 data. We will make Stanceosaurus publicly available to the research community upon publication and hope it will encourage further work on misinformation identification across languages and cultures.
Anthology ID:
2022.emnlp-main.138
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2132–2151
Language:
URL:
https://aclanthology.org/2022.emnlp-main.138
DOI:
10.18653/v1/2022.emnlp-main.138
Bibkey:
Cite (ACL):
Jonathan Zheng, Ashutosh Baheti, Tarek Naous, Wei Xu, and Alan Ritter. 2022. Stanceosaurus: Classifying Stance Towards Multicultural Misinformation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2132–2151, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Stanceosaurus: Classifying Stance Towards Multicultural Misinformation (Zheng et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-main.138.pdf
Software:
 2022.emnlp-main.138.software.zip