SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

Yanxiao Zhao; Yaqian Li; Zi-Hao Bo; Rinyoichi Takezoe; Haojia Hui; Mo Guang; Renlei; Xiaolin Qin; Kaiwen Long

SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

Yanxiao Zhao, Yaqian Li, Zi-Hao Bo, Rinyoichi Takezoe, Haojia Hui, Mo Guang, Renlei, Xiaolin Qin, Kaiwen Long

Abstract

Large language models (LLMs) exhibit strong general reasoning, yet the community lacks controllable, scalable, and verifiable tools to analyze and improve these abilities. We present SATQuest, a verifier that generates diverse SAT-based reasoning tasks directly from Conjunctive Normal Form (CNF) instances and checks answers objectively with PySAT. SATQuest factorizes evaluation along three orthogonal dimensions—instance, problem type, and question format—enabling fine-grained, multi-dimensional analysis and reinforcement fine-tuning. Randomized CNF generation mitigates memorization and supports reproducible experiments. Using SATQuest, we benchmark a range of open- and closed-weight LLMs and uncover persistent gaps in logical reasoning, particularly on higher-complexity tasks and in transfer beyond familiar mathematical notation to machine or narrative formats. We further show that reinforcement fine-tuning with SATQuest rewards substantially boosts targeted performance and generalizes to larger instances, while cross-format robustness remains challenging. Collectively, SATQuest provides verifier-backed infrastructure for controlled, scalable, and reproducible empirical research on LLM logical reasoning and its training.

Anthology ID:: 2026.acl-long.96
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2109–2131
Language:
URL:: https://aclanthology.org/2026.acl-long.96/
DOI:
Bibkey:
Cite (ACL):: Yanxiao Zhao, Yaqian Li, Zi-Hao Bo, Rinyoichi Takezoe, Haojia Hui, Mo Guang, Renlei, Xiaolin Qin, and Kaiwen Long. 2026. SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2109–2131, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs (Zhao et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.96.pdf
Checklist:: 2026.acl-long.96.checklist.pdf

PDF Cite Search Checklist Fix data