Temporal Consistency for LLM Reasoning Process Error Identification

Jiacheng Guo, Yue Wu, Jiahao Qiu, Kaixuan Huang, Xinzhe Juan, Ling Yang, Mengdi Wang


Abstract
Verification is crucial for effective mathematical reasoning. We present a new temporal consistency method where verifiers iteratively refine their judgments based on the previous assessment. Unlike one-round verification or multi-model debate approaches, our method leverages consistency in a sequence of self-reflection actions to improve verification accuracy. Empirical evaluations across diverse mathematical process error identification benchmarks (Mathcheck, ProcessBench, and PRM800K) show consistent performance improvements over baseline methods. When applied to the recent DeepSeek R1 distilled models, our method demonstrates strong performance, enabling 7B/8B distilled models to outperform all 70B/72B models and GPT-4o on ProcessBench. Notably, the distilled 14B model with our method achieves performance comparable to Deepseek-R1.
Anthology ID:
2025.findings-emnlp.1205
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22114–22129
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.1205/
DOI:
Bibkey:
Cite (ACL):
Jiacheng Guo, Yue Wu, Jiahao Qiu, Kaixuan Huang, Xinzhe Juan, Ling Yang, and Mengdi Wang. 2025. Temporal Consistency for LLM Reasoning Process Error Identification. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 22114–22129, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Temporal Consistency for LLM Reasoning Process Error Identification (Guo et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.1205.pdf
Checklist:
 2025.findings-emnlp.1205.checklist.pdf