Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning

Senjie Jin; Lu Chen; Zhiheng Xi; Yuhui Wang; Sirui Song; Yuhao Zhou; Xinbo Zhang; Peng Sun; Hong Lu; Tao Gui; Qi Zhang; Xuan-Jing Huang (黄萱菁)

doi:10.18653/v1/2025.emnlp-main.1366

Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning

Senjie Jin, Lu Chen, Zhiheng Xi, Yuhui Wang, Sirui Song, Yuhao Zhou, Xinbo Zhang, Peng Sun, Hong Lu, Tao Gui, Qi Zhang, Xuanjing Huang

Abstract

Natural language chain-of-thought (N-CoT) and Program chain-of-thought (P-CoT) have emerged as two primary paradigms for large language models (LLMs) to solve mathematical reasoning problems. Current research typically endeavors to achieve unidirectional enhancement: P-CoT enhanced N-CoT or N-CoT enhanced P-CoT. In this paper, we seek to fully unleash the two paradigms’ strengths for mutual enhancement and ultimately achieve simultaneous improvements. We conduct a detailed analysis of the error types across two paradigms, based on which we propose Parrot, a novel training pipeline for mathematical problems: 1) Three target-designed subtasks integrate sequential P-CoT and N-CoT generation. 2) A subtask hybrid training strategy to facilitate natural language semantic transferability. 3) The converted N-CoT auxiliary reward is designed to alleviate the sparse rewards in P-CoT optimization. Extensive experiments demonstrate that Parrot significantly enhances both the performance of N-CoT and P-CoT, especially on N-CoT. Using Parrot SFT, the LLaMA2’s and CodeLLaMA’s N-CoT performance achieve gains of +21.87 and +21.48 on MathQA over the RL baseline, which is resource-intensive.

Anthology ID:: 2025.emnlp-main.1366
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 26910–26927
Language:
URL:: https://aclanthology.org/2025.emnlp-main.1366/
DOI:: 10.18653/v1/2025.emnlp-main.1366
Bibkey:
Cite (ACL):: Senjie Jin, Lu Chen, Zhiheng Xi, Yuhui Wang, Sirui Song, Yuhao Zhou, Xinbo Zhang, Peng Sun, Hong Lu, Tao Gui, Qi Zhang, and Xuanjing Huang. 2025. Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 26910–26927, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning (Jin et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.1366.pdf
Checklist:: 2025.emnlp-main.1366.checklist.pdf

PDF Cite Search Checklist Fix data