Alignment with Fill-In-the-Middle for Enhancing Code Generation

Houxing Ren, Zimu Lu, Weikang Shi, Haotian Hou, Yunqiao Yang, Ke Wang, Aojun Zhou, Junting Pan, Mingjie Zhan, Hongsheng Li


Abstract
The code generation capabilities of Large Language Models (LLMs) have advanced applications like tool invocation and problem-solving. However, improving performance in code-related tasks remains challenging due to limited training data that is verifiable with accurate test cases. While Direct Preference Optimization (DPO) has shown promise, existing methods for generating test cases still face limitations. In this paper, we propose a novel approach that splits code snippets into smaller, granular blocks, creating more diverse DPO pairs from the same test cases. Additionally, we introduce the Abstract Syntax Tree (AST) splitting and curriculum training method to enhance the DPO training. Our approach demonstrates significant improvements in code generation tasks, as validated by experiments on benchmark datasets such as HumanEval (+), MBPP (+), APPS, LiveCodeBench, and BigCodeBench. Code and data are available at https://github.com/SenseLLM/StructureCoder.
Anthology ID:
2025.emnlp-main.419
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8315–8331
Language:
URL:
https://aclanthology.org/2025.emnlp-main.419/
DOI:
Bibkey:
Cite (ACL):
Houxing Ren, Zimu Lu, Weikang Shi, Haotian Hou, Yunqiao Yang, Ke Wang, Aojun Zhou, Junting Pan, Mingjie Zhan, and Hongsheng Li. 2025. Alignment with Fill-In-the-Middle for Enhancing Code Generation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 8315–8331, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Alignment with Fill-In-the-Middle for Enhancing Code Generation (Ren et al., EMNLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.emnlp-main.419.pdf
Checklist:
 2025.emnlp-main.419.checklist.pdf