Data-Efficient Auto-Regressive Document Retrieval for Fact Verification

James Thorne

doi:10.18653/v1/2022.sustainlp-1.7

Data-Efficient Auto-Regressive Document Retrieval for Fact Verification

Abstract

Document retrieval is a core component of many knowledge-intensive natural language processing task formulations such as fact verification. Sources of textual knowledge such as Wikipedia articles condition the generation of answers from the models. Recent advances in retrieval use sequence-to-sequence models to incrementally predict the title of the appropriate Wikipedia page given an input instance. However, this method requires supervision in the form of human annotation to label which Wikipedia pages contain appropriate context. This paper introduces a distant-supervision method that does not require any annotation train auto-regressive retrievers that attain competitive R-Precision and Recall in a zero-shot setting. Furthermore we show that with task-specific supervised fine-tuning, auto-regressive retrieval performance for two Wikipedia-based fact verification tasks can approach or even exceed full supervision using less than 1/4 of the annotated data. We release all code and models

Anthology ID:: 2022.sustainlp-1.7
Volume:: Proceedings of The Third Workshop on Simple and Efficient Natural Language Processing (SustaiNLP)
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates (Hybrid)
Editors:: Angela Fan, Iryna Gurevych, Yufang Hou, Zornitsa Kozareva, Sasha Luccioni, Nafise Sadat Moosavi, Sujith Ravi, Gyuwan Kim, Roy Schwartz, Andreas Rücklé
Venue:: sustainlp
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 44–51
Language:
URL:: https://aclanthology.org/2022.sustainlp-1.7
DOI:: 10.18653/v1/2022.sustainlp-1.7
Bibkey:
Cite (ACL):: James Thorne. 2022. Data-Efficient Auto-Regressive Document Retrieval for Fact Verification. In Proceedings of The Third Workshop on Simple and Efficient Natural Language Processing (SustaiNLP), pages 44–51, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):: Data-Efficient Auto-Regressive Document Retrieval for Fact Verification (Thorne, sustainlp 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.sustainlp-1.7.pdf
Video:: https://aclanthology.org/2022.sustainlp-1.7.mp4

PDF Cite Search Video