Fine-tuning Language Models for Joint Rewriting and Completion of Code with Potential Bugs

Dingmin Wang, Jinman Zhao, Hengzhi Pei, Samson Tan, Sheng Zha


Abstract
Handling drafty partial code remains a notable challenge in real-time code suggestion applications. Previous work has demonstrated shortcomings of large language models of code (CodeLLMs) in completing partial code with potential bugs. In this study, we view partial code as implementation hints and fine-tune CodeLLMs to jointly rewrite and complete partial code into functional full programs. We explore two strategies: one-pass generation and multi-pass iterative refinement. We construct new training and testing datasets using semantic-altering code transformations and iterative self-generations.We conduct comprehensive experiments over three representative open-sourced CodeLLMs – InCoder, CodeGen, and StarCoder.Results show that CodeLLMs fine-tuned using our approach achieve superior pass rates compared to the previous baselines across existing and newly-created benchmarks, effectively handle both potentially buggy and clean code, and largely preserve the integrity of the original partial implementations. We further present findings on the properties of the potential bugs we tested and on the design choices of our methods.
Anthology ID:
2024.findings-acl.938
Volume:
Findings of the Association for Computational Linguistics: ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15854–15868
Language:
URL:
https://aclanthology.org/2024.findings-acl.938
DOI:
10.18653/v1/2024.findings-acl.938
Bibkey:
Cite (ACL):
Dingmin Wang, Jinman Zhao, Hengzhi Pei, Samson Tan, and Sheng Zha. 2024. Fine-tuning Language Models for Joint Rewriting and Completion of Code with Potential Bugs. In Findings of the Association for Computational Linguistics: ACL 2024, pages 15854–15868, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Fine-tuning Language Models for Joint Rewriting and Completion of Code with Potential Bugs (Wang et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.938.pdf