CoT-Edit: Reinforcement Learning of Chain-of-Thought Reasoning for Code Edit Suggestion

Wuya Chen; Yihao Yang; Yue Lin

doi:10.18653/v1/2026.findings-acl.1407

CoT-Edit: Reinforcement Learning of Chain-of-Thought Reasoning for Code Edit Suggestion

Abstract

Code edit suggestion, which encompasses modifying, refactoring, and maintaining existing code, represents the most frequent software development activity and has become a focal point for AI-powered tools. Traditional methods translate explicit natural language instructions into code edits, while pattern-based approaches learn from users’ historical editing patterns to provide style-consistent and more accurate suggestions. However, these pattern-based methods still face two critical challenges: (1) difficulty handling edits that demand deep contextual reasoning, and (2) lack of interpretability in editing decisions. To tackle this, we propose CoT-Edit, a reinforcement learning framework that guides LLMs to discover chain-of-thought (CoT) reasoning paths for code editing without requiring human-annotated CoT data. Specifically, we design multi-step reasoning framework that enable: (1) analysis-guided code editing, and (2) seamless switching between CoT and non-CoT inference modes. Building on this, we introduce Edit-Aware Reward Modeling (EARM), a fine-grained diff-based reward approach for effective learning. Furthermore, we discover a LoRA merging strategy that enhances model generalization. Evaluations on an industrial dataset show that our approach achieves 60.2% edit accuracy, outperforming all strong baselines. Online A/B tests further confirm its effectiveness in production. Code is available at https://github.com/202230483077yyh/CoT-Edit.

Anthology ID:: 2026.findings-acl.1407
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 28219–28234
Language:
URL:: https://aclanthology.org/2026.findings-acl.1407/
DOI:: 10.18653/v1/2026.findings-acl.1407
Bibkey:
Cite (ACL):: Wuya Chen, Yihao Yang, and Yue Lin. 2026. CoT-Edit: Reinforcement Learning of Chain-of-Thought Reasoning for Code Edit Suggestion. In Findings of the Association for Computational Linguistics: ACL 2026, pages 28219–28234, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: CoT-Edit: Reinforcement Learning of Chain-of-Thought Reasoning for Code Edit Suggestion (Chen et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1407.pdf
Checklist:: 2026.findings-acl.1407.checklist.pdf

PDF Cite Search Checklist Fix data