Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback

Yen-Ting Lin; Di Jin; Tengyu Xu; Tianhao Wu; Sainbayar Sukhbaatar; Chen Zhu; Yun He; Yun-Nung Chen; Jason E Weston; Yuandong Tian; Arash Rahnama; Sinong Wang; Hao Ma; Han Fang

doi:10.18653/v1/2025.mathnlp-main.2

Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback

Yen-Ting Lin, Di Jin, Tengyu Xu, Tianhao Wu, Sainbayar Sukhbaatar, Chen Zhu, Yun He, Yun-Nung Chen, Jason E Weston, Yuandong Tian, Arash Rahnama, Sinong Wang, Hao Ma, Han Fang

Abstract

Large language models (LLMs) have recently demonstrated remarkable success in mathematical reasoning. Despite progress in methods like chain-of-thought prompting and self-consistency sampling, these advances often focus on final correctness without ensuring that the underlying reasoning process is coherent and reliable. This paper introduces Step-KTO, a training framework that combines process-level and outcome-level binary feedback to guide LLMs toward more trustworthy reasoning trajectories. By providing binary evaluations for both the intermediate reasoning steps and the final answer, Step-KTO encourages the model to adhere to logical progressions rather than relying on superficial shortcuts. Our experiments on challenging mathematical benchmarks show that Step-KTO significantly improves both final answer accuracy and the quality of intermediate reasoning steps. For example, on the MATH-500 dataset, Step-KTO achieves a notable improvement in Pass@1 accuracy over strong baselines. These results highlight the promise of integrating stepwise process feedback into LLM training, paving the way toward more interpretable and dependable reasoning capabilities.

Anthology ID:: 2025.mathnlp-main.2
Volume:: Proceedings of The 3rd Workshop on Mathematical Natural Language Processing (MathNLP 2025)
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Marco Valentino, Deborah Ferreira, Mokanarangan Thayaparan, Leonardo Ranaldi, Andre Freitas
Venues:: MathNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15–33
Language:
URL:: https://aclanthology.org/2025.mathnlp-main.2/
DOI:: 10.18653/v1/2025.mathnlp-main.2
Bibkey:
Cite (ACL):: Yen-Ting Lin, Di Jin, Tengyu Xu, Tianhao Wu, Sainbayar Sukhbaatar, Chen Zhu, Yun He, Yun-Nung Chen, Jason E Weston, Yuandong Tian, Arash Rahnama, Sinong Wang, Hao Ma, and Han Fang. 2025. Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback. In Proceedings of The 3rd Workshop on Mathematical Natural Language Processing (MathNLP 2025), pages 15–33, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback (Lin et al., MathNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.mathnlp-main.2.pdf

PDF Cite Search Fix data