EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving

Mukai Li; Linfeng Song; Zhenwen Liang; Jiahao Xu; Shansan Gong; Qi Liu; Haitao Mi; Dong Yu (于东)

EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving

Mukai Li, Linfeng Song, Zhenwen Liang, Jiahao Xu, Shansan Gong, Qi Liu, Haitao Mi, Dong Yu

Abstract

Large Language Models (LLMs) have recently advanced the field of Automated Theorem Proving (ATP), attaining substantial performance gains through widely adopted test-time scaling strategies, notably reflective Chain-of-Thought (CoT) reasoning and increased sampling passes. However, they both introduce significant computational overhead for inference. Moreover, existing cost analyses typically regulate only the number of sampling passes, while neglecting the substantial disparities in sampling costs introduced by different scaling strategies. In this paper, we systematically compare the efficiency of different test-time scaling strategies for ATP models and demonstrate the inefficiency of the current state-of-the-art (SOTA) open-source approaches. We then investigate approaches to significantly reduce token usage and sample passes while maintaining the original performance. Specifically, we propose two complementary methods that can be integrated into a unified EconRL pipeline for amplified benefits: (1) a dynamic Chain-of-Thought (CoT) switching mechanism designed to mitigate unnecessary token consumption, and (2) Diverse parallel-scaled reinforcement learning (RL) with trainable prefixes to enhance pass rates under constrained sampling passes. Experiments on miniF2F and ProofNet demonstrate that our EconProver-GD achieves comparable performance to baseline methods with only 12% of the computational cost. This work provides actionable insights for deploying lightweight ATP models without sacrificing performance.

Anthology ID:: 2026.acl-long.2121
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 45741–45753
Language:
URL:: https://aclanthology.org/2026.acl-long.2121/
DOI:
Bibkey:
Cite (ACL):: Mukai Li, Linfeng Song, Zhenwen Liang, Jiahao Xu, Shansan Gong, Qi Liu, Haitao Mi, and Dong Yu. 2026. EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 45741–45753, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving (Li et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.2121.pdf
Checklist:: 2026.acl-long.2121.checklist.pdf

PDF Cite Search Checklist Fix data