Improving Model Factuality with Fine-grained Critique-based Evaluator

Yiqing Xie; Wenxuan Zhou; Pradyot Prakash; Di Jin; Yuning Mao; Quintin Fettes; Arya Talebzadeh; Sinong Wang; Han Fang; Carolyn Rose; Daniel Fried; Hejia Zhang

doi:10.18653/v1/2025.acl-long.400

Improving Model Factuality with Fine-grained Critique-based Evaluator

Yiqing Xie, Wenxuan Zhou, Pradyot Prakash, Di Jin, Yuning Mao, Quintin Fettes, Arya Talebzadeh, Sinong Wang, Han Fang, Carolyn Rose, Daniel Fried, Hejia Zhang

Abstract

Factuality evaluation aims to detect factual errors produced by language models (LMs) and hence guide the development of more factual models. Towards this goal, we train a factuality evaluator, FenCE, that provides LM generators with claim-level factuality feedback. In particular, we train FenCE to (1) generate textual critiques along with scores and (2) make claim-level judgment based on diverse source documents obtained by various tools, via data augmentation on a combination of public judgment datasets. We then present a framework that leverages FenCE to improve the factuality of LM generators by constructing training data. Specifically, we generate a set of candidate responses, ask FenCE to revise and score each response without introducing lesser-known facts, and train the generator by preferring highly scored revised responses. Experiments show that our data augmentation methods improve the evaluator’s accuracy by 2.9% on LLM-AggreFact. With FenCE, we improve Llama2-7B-chat/Llama3-8B-chat’s factuality rate by 16.86%/14.45% on FActScore, outperforming state-of-the-art factuality finetuning methods by 8.83%/6.96%.

Anthology ID:: 2025.acl-long.400
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8140–8155
Language:
URL:: https://aclanthology.org/2025.acl-long.400/
DOI:: 10.18653/v1/2025.acl-long.400
Bibkey:
Cite (ACL):: Yiqing Xie, Wenxuan Zhou, Pradyot Prakash, Di Jin, Yuning Mao, Quintin Fettes, Arya Talebzadeh, Sinong Wang, Han Fang, Carolyn Rose, Daniel Fried, and Hejia Zhang. 2025. Improving Model Factuality with Fine-grained Critique-based Evaluator. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8140–8155, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Improving Model Factuality with Fine-grained Critique-based Evaluator (Xie et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.400.pdf

PDF Cite Search Fix data