Xiao Zhu
2025
FaStFact: Faster, Stronger Long-Form Factuality Evaluations in LLMs
Yingjia Wan
|
Haochen Tan
|
Xiao Zhu
|
Xinyu Zhou
|
Zhiwei Li
|
Qingsong Lv
|
Changxuan Sun
|
Jiaqi Zeng
|
Yi Xu
|
Jianqiao Lu
|
Yinhong Liu
|
Zhijiang Guo
Findings of the Association for Computational Linguistics: EMNLP 2025
Evaluating the factuality of long-form generations from Large Language Models (LLMs) remains challenging due to accuracy issues and costly human assessment. Prior evaluation pipelines attempt this by decomposing text into claims, searching for evidence, and verifying claims, but suffer from critical drawbacks: (1) inefficiency due to complex pipeline components unsuitable for long LLM outputs, and (2) ineffectiveness stemming from inaccurate claim sets and insufficient evidence collection of one-line SERP snippets. To address these limitations, we adapt the existing decompose-then-verify evaluation framework and propose **FaStFact**, a fast and strong evaluation pipeline that achieves the highest alignment with human evaluation and efficiency among existing baselines. FaStFact first employs chunk-level claim extraction integrated with confidence-based pre-verification, significantly reducing the cost of web searching and inference calling while ensuring reliability. For searching and verification, it gathers document-level evidence from crawled website pages for retrieval during verification, addressing the evidence insufficiency problem in previous pipelines. Extensive experiments based on an aggregated and manually annotated benchmark demonstrate the reliability of FaStFact in both efficiently and effectively evaluating the factuality of long-form LLM generations. We submit the paper with code and benchmark, and will make them publicly available to facilitate research.
Search
Fix author
Co-authors
- Zhijiang Guo 1
- Zhiwei Li 1
- Yinhong Liu 1
- Jianqiao Lu 1
- Qingsong Lv 1
- show all...