Jiaxing Guo

2025

Large language models have demonstrated considerable capabilities in various mathematical tasks, yet they often fall short in rigorous, proof-based reasoning essential for research-level mathematics. Retrieval-augmented generation presents a promising direction for enhancing these capabilities. This paper systematically explores RAG for natural language theorem proving, revealing that LLMs, when augmented with retrieved proofs rather than just theorems, can function as potent mimetic theorem provers: these models can effectively generalize proof techniques found in unstructured retrieved contexts to construct correct proofs for novel theorems. Building upon this finding, we introduce Dual RAG, a simple yet effective RAG framework. Dual RAG employs LLMs to identify underlying reasoning challenges within theorems, augmenting both queries and document contexts to improve retrieval performance. Our experiments show that Dual RAG achieves substantial improvements in retrieval performance, with gains of up to 34.19%. Expert evaluations further confirm that these retrieval enhancements directly translate into higher quality proof generation. Notably, when integrated with the arXiv API, Dual RAG demonstrates the ability to prove research-level theorems in theoretical machine learning, highlighting its strong potential as a foundational element for a practical mathematical copilot.

Co-authors

Wenjie Yang 1

Shengzhong Zhang 1

Da Zheng 1

Venues

findings1

Fix author