The Data Challenge in Misinformation Detection: Source Reputation vs. Content Veracity

Fatemeh Torabi Asr, Maite Taboada


Abstract
Misinformation detection at the level of full news articles is a text classification problem. Reliably labeled data in this domain is rare. Previous work relied on news articles collected from so-called “reputable” and “suspicious” websites and labeled accordingly. We leverage fact-checking websites to collect individually-labeled news articles with regard to the veracity of their content and use this data to test the cross-domain generalization of a classifier trained on bigger text collections but labeled according to source reputation. Our results suggest that reputation-based classification is not sufficient for predicting the veracity level of the majority of news articles, and that the system performance on different test datasets depends on topic distribution. Therefore collecting well-balanced and carefully-assessed training data is a priority for developing robust misinformation detection systems.
Anthology ID:
W18-5502
Volume:
Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)
Month:
November
Year:
2018
Address:
Brussels, Belgium
Editors:
James Thorne, Andreas Vlachos, Oana Cocarascu, Christos Christodoulopoulos, Arpit Mittal
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10–15
Language:
URL:
https://aclanthology.org/W18-5502/
DOI:
10.18653/v1/W18-5502
Bibkey:
Cite (ACL):
Fatemeh Torabi Asr and Maite Taboada. 2018. The Data Challenge in Misinformation Detection: Source Reputation vs. Content Veracity. In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), pages 10–15, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
The Data Challenge in Misinformation Detection: Source Reputation vs. Content Veracity (Torabi Asr & Taboada, EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-5502.pdf
Code
 sfu-discourse-lab/Misinformation_detection
Data
FEVER