CodeRAG-Bench: Can Retrieval Augment Code Generation?

Zora Zhiruo Wang; Akari Asai; Xinyan Velocity Yu; Frank F. Xu; Yiqing Xie; Graham Neubig; Daniel Fried

doi:10.18653/v1/2025.findings-naacl.176

CodeRAG-Bench: Can Retrieval Augment Code Generation?

Zora Zhiruo Wang, Akari Asai, Xinyan Velocity Yu, Frank F. Xu, Yiqing Xie, Graham Neubig, Daniel Fried

Abstract

While language models (LMs) excel at generating code, many programs are difficult to generate using only parametric knowledge. Despite the success of retrieval-augmented generation (RAG) in text-centric tasks, its potential for code generation remains under-explored. This work introduces CodeRAG-bench, a holistic retrieval-augmented code generation benchmark covering tasks like basic programming, open-domain, and repository-level problems and provides reproducible evaluations on both retrieval and end-to-end code generation performance. We further create a diverse, open datastore for code retrieval, aggregating sources such as competition solutions, tutorials, library documentation, StackOverflow posts, and GitHub repositories. Based on CodeRAG-bench, we conduct large-scale evaluations of 10 retrievers and 10 LMs and systematically analyze when retrieval can benefit code generation models and identify remaining challenges. We find that while retrieving high-quality contexts improves code generation, retrievers often struggle to fetch useful contexts, and generators face limitations in using those contexts effectively. We hope CodeRAG-bench encourages further development in code-oriented RAG methods.

Anthology ID:: 2025.findings-naacl.176
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3199–3214
Language:
URL:: https://aclanthology.org/2025.findings-naacl.176/
DOI:: 10.18653/v1/2025.findings-naacl.176
Bibkey:
Cite (ACL):: Zora Zhiruo Wang, Akari Asai, Xinyan Velocity Yu, Frank F. Xu, Yiqing Xie, Graham Neubig, and Daniel Fried. 2025. CodeRAG-Bench: Can Retrieval Augment Code Generation?. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 3199–3214, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: CodeRAG-Bench: Can Retrieval Augment Code Generation? (Wang et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-naacl.176.pdf

PDF Cite Search Fix data