Aditya Pillai
2024
Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers
Yuxia Wang
|
Revanth Gangi Reddy
|
Zain Muhammad Mujahid
|
Arnav Arora
|
Aleksandr Rubashevskii
|
Jiahui Geng
|
Osama Mohammed Afzal
|
Liangming Pan
|
Nadav Borenstein
|
Aditya Pillai
|
Isabelle Augenstein
|
Iryna Gurevych
|
Preslav Nakov
Findings of the Association for Computational Linguistics: EMNLP 2024
The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the factual accuracy of their outputs. In this work, we present Factcheck-Bench, a holistic end-to-end framework for annotating and evaluating the factuality of LLM-generated responses, which encompasses a multi-stage annotation scheme designed to yield detailed labels for fact-checking and correcting not just the final prediction, but also the intermediate steps that a fact-checking system might need to take. Based on this framework, we construct an open-domain factuality benchmark in three-levels of granularity: claim, sentence, and document. We further propose a system, Factcheck-GPT, which follows our framework, and we show that it outperforms several popular LLM fact-checkers. We make our annotation tool, annotated data, benchmark, and code available at https://github.com/yuxiaw/Factcheck-GPT.