Enhancing Logical Reasoning in Language Models via Symbolically-Guided Monte Carlo Process Supervision

Xingwei Tan; Marco Valentino; Mahmud Elahi Akhter; Maria Liakata; Nikolaos Aletras

doi:10.18653/v1/2025.emnlp-main.1624

Enhancing Logical Reasoning in Language Models via Symbolically-Guided Monte Carlo Process Supervision

Xingwei Tan, Marco Valentino, Mahmud Elahi Akhter, Maria Liakata, Nikolaos Aletras

Abstract

Large language models (LLMs) have shown strong performance in many reasoning benchmarks. However, recent studies have pointed to memorization, rather than generalization, as one of the leading causes for such performance. LLMs, in fact, are susceptible to content variations, demonstrating a lack of robust planning or symbolic abstractions supporting their reasoning process. To improve reliability, many attempts have been made to combine LLMs with symbolic methods. Nevertheless, existing approaches fail to effectively leverage symbolic representations due to the challenges involved in developing reliable and scalable verification mechanisms. In this paper, we propose to overcome such limitations by synthesizing high-quality symbolic reasoning trajectories with stepwise pseudo-labels at scale via Monte Carlo estimation. A Process Reward Model (PRM) can be efficiently trained based on the synthesized data and then used to select more symbolic trajectories. The trajectories are then employed with Direct Preference Optimization (DPO) and Supervised Fine-Tuning (SFT) to improve logical reasoning and generalization. Our results on benchmarks (i.e., FOLIO and LogicAsker) show the effectiveness of the proposed method with gains on frontier and open-weight models. Moreover, additional experiments on claim verification data reveal that fine-tuning on the generated symbolic reasoning trajectories enhances out-of-domain generalizability, suggesting the potential impact of the proposed method in enhancing planning and logical reasoning.

Anthology ID:: 2025.emnlp-main.1624
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 31886–31900
Language:
URL:: https://aclanthology.org/2025.emnlp-main.1624/
DOI:: 10.18653/v1/2025.emnlp-main.1624
Bibkey:
Cite (ACL):: Xingwei Tan, Marco Valentino, Mahmud Elahi Akhter, Maria Liakata, and Nikolaos Aletras. 2025. Enhancing Logical Reasoning in Language Models via Symbolically-Guided Monte Carlo Process Supervision. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 31886–31900, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Enhancing Logical Reasoning in Language Models via Symbolically-Guided Monte Carlo Process Supervision (Tan et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.1624.pdf
Checklist:: 2025.emnlp-main.1624.checklist.pdf

PDF Cite Search Checklist Fix data