RATIONALYST: Pre-training Process-Supervision for Improving Reasoning

Dongwei Jiang; Guoxuan Wang; Yining Lu; Andrew Wang; Jingyu Zhang; Chuyu Liu; Benjamin Van Durme; Daniel Khashabi

doi:10.18653/v1/2025.acl-long.1288

RATIONALYST: Pre-training Process-Supervision for Improving Reasoning

Dongwei Jiang, Guoxuan Wang, Yining Lu, Andrew Wang, Jingyu Zhang, Chuyu Liu, Benjamin Van Durme, Daniel Khashabi

Abstract

The reasoning steps generated by LLMs might be incomplete, as they mimic logical leaps common in everyday communication found in their pre-training data: underlying rationales are frequently left implicit (unstated). To address this challenge, we introduce RATIONALYST, a model for process-supervision of reasoning based on pre-training on a vast collection of rationale annotations extracted from unlabeled data. We extract 79k rationales from web-scale unlabelled dataset (the Pile) and a combination of reasoning datasets with minimal human intervention. This web-scale pre-training for reasoning allows RATIONALYST to consistently generalize across diverse reasoning tasks, including mathematical, commonsense, scientific, and logical reasoning. Fine-tuned from LLaMa-3-8B, RATIONALYST improves the accuracy of reasoning by an average of 3.9% on 7 representative reasoning benchmarks. It also demonstrates superior performance compared to significantly larger verifiers like GPT-4 and similarly sized models fine-tuned on matching training sets.

Anthology ID:: 2025.acl-long.1288
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 26547–26566
Language:
URL:: https://aclanthology.org/2025.acl-long.1288/
DOI:: 10.18653/v1/2025.acl-long.1288
Bibkey:
Cite (ACL):: Dongwei Jiang, Guoxuan Wang, Yining Lu, Andrew Wang, Jingyu Zhang, Chuyu Liu, Benjamin Van Durme, and Daniel Khashabi. 2025. RATIONALYST: Pre-training Process-Supervision for Improving Reasoning. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 26547–26566, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: RATIONALYST: Pre-training Process-Supervision for Improving Reasoning (Jiang et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.1288.pdf

PDF Cite Search Fix data