Automated Fact-Checking of Claims from Wikipedia

Aalok Sathe; Salar Ather; Tuan Manh Le; Nathan Perry; Joonsuk Park

Automated Fact-Checking of Claims from Wikipedia

Aalok Sathe, Salar Ather, Tuan Manh Le, Nathan Perry, Joonsuk Park

Abstract

Automated fact checking is becoming increasingly vital as both truthful and fallacious information accumulate online. Research on fact checking has benefited from large-scale datasets such as FEVER and SNLI. However, such datasets suffer from limited applicability due to the synthetic nature of claims and/or evidence written by annotators that differ from real claims and evidence on the internet. To this end, we present WikiFactCheck-English, a dataset of 124k+ triples consisting of a claim, context and an evidence document extracted from English Wikipedia articles and citations, as well as 34k+ manually written claims that are refuted by the evidence documents. This is the largest fact checking dataset consisting of real claims and evidence to date; it will allow the development of fact checking systems that can better process claims and evidence in the real world. We also show that for the NLI subtask, a logistic regression system trained using existing and novel features achieves peak accuracy of 68%, providing a competitive baseline for future work. Also, a decomposable attention model trained on SNLI significantly underperforms the models trained on this dataset, suggesting that models trained on manually generated data may not be sufficiently generalizable or suitable for fact checking real-world claims.

Anthology ID:: 2020.lrec-1.849
Volume:: Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:: May
Year:: 2020
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 6874–6882
Language:: English
URL:: https://aclanthology.org/2020.lrec-1.849/
DOI:
Bibkey:
Cite (ACL):: Aalok Sathe, Salar Ather, Tuan Manh Le, Nathan Perry, and Joonsuk Park. 2020. Automated Fact-Checking of Claims from Wikipedia. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6874–6882, Marseille, France. European Language Resources Association.
Cite (Informal):: Automated Fact-Checking of Claims from Wikipedia (Sathe et al., LREC 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.lrec-1.849.pdf

PDF Cite Search Fix data