ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline

Yifan Xu, Xiao Liu, Xinghan Liu, Zhenyu Hou, Yueyan Li, Xiaohan Zhang, Zihan Wang, Aohan Zeng, Zhengxiao Du, Zhao Wenyi, Jie Tang, Yuxiao Dong


Abstract
Large language models (LLMs) have shown excellent mastering of human language but still struggle in real-world applications that require mathematical problem-solving. While many strategies and datasets to enhance LLMs’ mathematics are developed, it remains a challenge to simultaneously maintain and improve both language and mathematical capabilities in deployed LLM systems. In this work, we tailor the Self-Critique pipeline, which addresses the challenge in the feedback learning stage of LLM alignment. We first train a general Math-Critique model from the LLM itself to provide feedback signals. Then, we sequentially employ rejective fine-tuning and direct preference optimization over the LLM’s own generations for data collection. Based on ChatGLM3-32B, we conduct experiments on both academic and our newly created challenging dataset, MathUserEval. Results show that our pipeline significantly enhances the LLM’s mathematical problem-solving while still improving its language ability, outperforming LLMs that could be two times larger. Related techniques have been deployed to ChatGLM, an online serving LLM. Related evaluation datasets and scripts are released at https://github.com/THUDM/ChatGLM-Math.
Anthology ID:
2024.findings-emnlp.569
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9733–9760
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.569
DOI:
Bibkey:
Cite (ACL):
Yifan Xu, Xiao Liu, Xinghan Liu, Zhenyu Hou, Yueyan Li, Xiaohan Zhang, Zihan Wang, Aohan Zeng, Zhengxiao Du, Zhao Wenyi, Jie Tang, and Yuxiao Dong. 2024. ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 9733–9760, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline (Xu et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.569.pdf