FoVer: First-Order Logic Verification for Natural Language Reasoning

Yu Pei, Yongping Du, Xingnan Jin


Abstract
Large Language Models (LLMs) have shown remarkable capabilities in various tasks, including logical reasoning. However, their propensity for generating incorrect or inconsistent responses remains a significant concern. To address this issue, we propose FoVer (First-order logic Verification), an automated pipeline that verifies the logical correctness of reasoning texts using first-order logic. The pipeline operates in two main steps: (1) LLM-driven translation of natural language into executable logical expressions, and (2) automated logical verification using the Z3 theorem prover. We evaluate FoVer on specialized logical datasets (ProofWriter and FOLIO) and real-world LLM outputs (REVEAL). The results demonstrate that FoVer outperforms existing methods in logical verification significantly, showing notable improvements in accuracy across both ideal and practical scenarios. The pipeline also demonstrates its potential in identifying annotation errors in existing datasets, and it could be utilized for constructing new logical reasoning datasets. This work presents a significant step forward in enhancing the trustworthiness of LLM outputs, particularly in tasks requiring logical integrity.1
Anthology ID:
2025.tacl-1.61
Volume:
Transactions of the Association for Computational Linguistics, Volume 13
Month:
Year:
2025
Address:
Cambridge, MA
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
1340–1359
Language:
URL:
https://aclanthology.org/2025.tacl-1.61/
DOI:
10.1162/tacl.a.41
Bibkey:
Cite (ACL):
Yu Pei, Yongping Du, and Xingnan Jin. 2025. FoVer: First-Order Logic Verification for Natural Language Reasoning. Transactions of the Association for Computational Linguistics, 13:1340–1359.
Cite (Informal):
FoVer: First-Order Logic Verification for Natural Language Reasoning (Pei et al., TACL 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.tacl-1.61.pdf