FEVER: a Large-scale Dataset for Fact Extraction and VERification

James Thorne; Andreas Vlachos; Christos Christodoulopoulos; Arpit Mittal

doi:10.18653/v1/N18-1074

FEVER: a Large-scale Dataset for Fact Extraction and VERification

James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal

Abstract

In this paper we introduce a new publicly available dataset for verification against textual sources, FEVER: Fact Extraction and VERification. It consists of 185,445 claims generated by altering sentences extracted from Wikipedia and subsequently verified without knowledge of the sentence they were derived from. The claims are classified as Supported, Refuted or NotEnoughInfo by annotators achieving 0.6841 in Fleiss kappa. For the first two classes, the annotators also recorded the sentence(s) forming the necessary evidence for their judgment. To characterize the challenge of the dataset presented, we develop a pipeline approach and compare it to suitably designed oracles. The best accuracy we achieve on labeling a claim accompanied by the correct evidence is 31.87%, while if we ignore the evidence we achieve 50.91%. Thus we believe that FEVER is a challenging testbed that will help stimulate progress on claim verification against textual sources.

Anthology ID:: N18-1074
Volume:: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Month:: June
Year:: 2018
Address:: New Orleans, Louisiana
Editors:: Marilyn Walker, Heng Ji, Amanda Stent
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 809–819
Language:
URL:: https://aclanthology.org/N18-1074/
DOI:: 10.18653/v1/N18-1074
Bibkey:
Cite (ACL):: James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2018. FEVER: a Large-scale Dataset for Fact Extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 809–819, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):: FEVER: a Large-scale Dataset for Fact Extraction and VERification (Thorne et al., NAACL 2018)
Copy Citation:
PDF:: https://aclanthology.org/N18-1074.pdf
Note:: N18-1074.Notes.pdf
Code: sheffieldnlp/fever-baselines + additional community code
Data: FEVER, SNLI

PDF Cite Search Code Note Fix data