CodeFusion: A Pre-trained Diffusion Model for Code Generation

Mukul Singh; José Cambronero; Sumit Gulwani; Vu Le; Carina Negreanu; Gust Verbruggen

doi:10.18653/v1/2023.emnlp-main.716

CodeFusion: A Pre-trained Diffusion Model for Code Generation

Mukul Singh, José Cambronero, Sumit Gulwani, Vu Le, Carina Negreanu, Gust Verbruggen

Abstract

Imagine a developer who can only change their last line of code—how often would they have to start writing a function from scratch before it is correct? Auto-regressive models for code generation from natural language have a similar limitation: they do not easily allow reconsidering earlier tokens generated. We introduce CodeFusion, a pre-trained diffusion code generation model that addresses this limitation by iteratively denoising a complete program conditioned on the encoded natural language. We evaluate CodeFusion on the task of natural language to code generation for Bash, Python, and Microsoft Excel conditional formatting (CF) rules. Experiments show that CodeFusion (75M parameters) performs on par with state-of-the-art auto-regressive systems (350M-175B parameters) in top-1 accuracy and outperforms them in top-3 and top-5 accuracy due to its better balance in diversity versus quality.

Anthology ID:: 2023.emnlp-main.716
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11697–11708
Language:
URL:: https://aclanthology.org/2023.emnlp-main.716
DOI:: 10.18653/v1/2023.emnlp-main.716
Bibkey:
Cite (ACL):: Mukul Singh, José Cambronero, Sumit Gulwani, Vu Le, Carina Negreanu, and Gust Verbruggen. 2023. CodeFusion: A Pre-trained Diffusion Model for Code Generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 11697–11708, Singapore. Association for Computational Linguistics.
Cite (Informal):: CodeFusion: A Pre-trained Diffusion Model for Code Generation (Singh et al., EMNLP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.emnlp-main.716.pdf
Video:: https://aclanthology.org/2023.emnlp-main.716.mp4

PDF Cite Search Video