Retrieval-Augmented Language Models are Mimetic Theorem Provers

Wenjie Yang, Ruiyuan Huang, Jiaxing Guo, Zicheng Lyu, Tongshan Xu, Shengzhong Zhang, Lun Du, Da Zheng, Zengfeng Huang


Abstract
Large language models have demonstrated considerable capabilities in various mathematical tasks, yet they often fall short in rigorous, proof-based reasoning essential for research-level mathematics. Retrieval-augmented generation presents a promising direction for enhancing these capabilities. This paper systematically explores RAG for natural language theorem proving, revealing that LLMs, when augmented with retrieved proofs rather than just theorems, can function as potent mimetic theorem provers: these models can effectively generalize proof techniques found in unstructured retrieved contexts to construct correct proofs for novel theorems. Building upon this finding, we introduce Dual RAG, a simple yet effective RAG framework. Dual RAG employs LLMs to identify underlying reasoning challenges within theorems, augmenting both queries and document contexts to improve retrieval performance. Our experiments show that Dual RAG achieves substantial improvements in retrieval performance, with gains of up to 34.19%. Expert evaluations further confirm that these retrieval enhancements directly translate into higher quality proof generation. Notably, when integrated with the arXiv API, Dual RAG demonstrates the ability to prove research-level theorems in theoretical machine learning, highlighting its strong potential as a foundational element for a practical mathematical copilot.
Anthology ID:
2025.findings-emnlp.1162
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
21301–21313
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.1162/
DOI:
Bibkey:
Cite (ACL):
Wenjie Yang, Ruiyuan Huang, Jiaxing Guo, Zicheng Lyu, Tongshan Xu, Shengzhong Zhang, Lun Du, Da Zheng, and Zengfeng Huang. 2025. Retrieval-Augmented Language Models are Mimetic Theorem Provers. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 21301–21313, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Retrieval-Augmented Language Models are Mimetic Theorem Provers (Yang et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.1162.pdf
Checklist:
 2025.findings-emnlp.1162.checklist.pdf