Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing

Kaishuai Xu; Tiezheng Yu; Wenjun Hou; Yi Cheng; Chak Tou Leong; Liangyou Li; Xin Jiang; Lifeng Shang; Qun Liu; Wenjie Li

doi:10.18653/v1/2025.acl-long.1506

Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing

Kaishuai Xu, Tiezheng Yu, Wenjun Hou, Yi Cheng, Chak Tou Leong, Liangyou Li, Xin Jiang, Lifeng Shang, Qun Liu, Wenjie Li

Abstract

Large Language Models (LLMs) have exhibited strong mathematical reasoning prowess, tackling tasks ranging from basic arithmetic to advanced competition-level problems. However, frequently occurring subtle yet critical errors, such as miscalculations or incorrect substitutions, limit the LLMs’ full potential. Existing studies to improve mathematical ability typically involve applying preference learning to step-wise solution pairs. Although these methods leverage samples of varying granularity to mitigate reasoning errors, they overlook critical subtle errors. In this work, we propose a novel preference learning framework called eRror-Injected Self-Editing (RISE), which injects predefined subtle errors into pivotal tokens in reasoning or computation steps to construct hard pairs for error mitigation. In detail, RISE uses the LLM itself to edit a small number of tokens in the solution, injecting designed subtle errors. Then, pairs composed of self-edited solutions and their corresponding correct ones, along with pairs of correct and incorrect solutions obtained through sampling, are used together for subtle error-aware DPO training. Compared with other preference learning methods, RISE further refines the training objective without requiring fine-grained sampling or preference annotation. Extensive experiments validate the effectiveness of RISE, with preference learning on Qwen2-7B-Instruct yielding notable improvements of 3.0% on GSM8K and 7.9% on MATH with only 4.5K training samples. Moreover, the effect of error mitigation extends from mathematical reasoning to logical reasoning and code generation.

Anthology ID:: 2025.acl-long.1506
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 31184–31203
Language:
URL:: https://aclanthology.org/2025.acl-long.1506/
DOI:: 10.18653/v1/2025.acl-long.1506
Bibkey:
Cite (ACL):: Kaishuai Xu, Tiezheng Yu, Wenjun Hou, Yi Cheng, Chak Tou Leong, Liangyou Li, Xin Jiang, Lifeng Shang, Qun Liu, and Wenjie Li. 2025. Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 31184–31203, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing (Xu et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.1506.pdf

PDF Cite Search Fix data