Tree-of-Evolution: Tree-Structured Instruction Evolution for Code Generation in Large Language Models

Ziyang Luo; Kaixin Li; Hongzhan Lin; Yuchen Tian; Mohan Kankanhalli; Jing Ma

doi:10.18653/v1/2025.acl-long.14

Tree-of-Evolution: Tree-Structured Instruction Evolution for Code Generation in Large Language Models

Ziyang Luo, Kaixin Li, Hongzhan Lin, Yuchen Tian, Mohan Kankanhalli, Jing Ma

Abstract

Data synthesis has become a crucial research area in large language models (LLMs), especially for generating high-quality instruction fine-tuning data to enhance downstream performance. In code generation, a key application of LLMs, manual annotation of code instruction data is costly. Recent methods, such as Code Evol-Instruct and OSS-Instruct, leverage LLMs to synthesize large-scale code instruction data, significantly improving LLM coding capabilities. However, these approaches face limitations due to unidirectional synthesis and randomness-driven generation, which restrict data quality and diversity. To overcome these challenges, we introduce Tree-of-Evolution (ToE), a novel framework that models code instruction synthesis process with a tree structure, exploring multiple evolutionary paths to alleviate the constraints of unidirectional generation. Additionally, we propose optimization-driven evolution, which refines each generation step based on the quality of the previous iteration. Experimental results across five widely-used coding benchmarks—HumanEval, MBPP, EvalPlus, LiveCodeBench, and BigCodeBench—demonstrate that base models fine-tuned on just 75k data synthesized by our method achieve comparable or superior performance to the state-of-the-art open-weight Code LLM, Qwen2.5-Coder-Instruct, which was fine-tuned on millions of samples.

Anthology ID:: 2025.acl-long.14
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 297–316
Language:
URL:: https://aclanthology.org/2025.acl-long.14/
DOI:: 10.18653/v1/2025.acl-long.14
Bibkey:
Cite (ACL):: Ziyang Luo, Kaixin Li, Hongzhan Lin, Yuchen Tian, Mohan Kankanhalli, and Jing Ma. 2025. Tree-of-Evolution: Tree-Structured Instruction Evolution for Code Generation in Large Language Models. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 297–316, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Tree-of-Evolution: Tree-Structured Instruction Evolution for Code Generation in Large Language Models (Luo et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.14.pdf

PDF Cite Search Fix data