Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study

Bowen Li; Wenhan Wu; Ziwei Tang; Lin Shi; John Yang; Jinyang Li; Shunyu Yao; Chen Qian; Binyuan Hui; Qicheng Zhang; Zhiyin Yu; He Du; Ping Yang; Dahua Lin; Chao Peng; Kai Chen

Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study

Bowen Li, Wenhan Wu, Ziwei Tang, Lin Shi, John Yang, Jinyang Li, Shunyu Yao, Chen Qian, Binyuan Hui, Qicheng Zhang, Zhiyin Yu, He Du, Ping Yang, Dahua Lin, Chao Peng, Kai Chen

Abstract

Recent advancements in large language models (LLMs) have significantly enhanced their coding capabilities. However, existing benchmarks predominantly focused on simplified or isolated aspects of coding, such as single-file code generation or repository issue debugging, falling short of measuring the full spectrum of challenges raised by real-world programming activities. In this case study, we explore the performance of LLMs across the entire software development lifecycle with DevEval, encompassing stages including software design, environment setup, implementation, acceptance testing, and unit testing. DevEval features four programming languages, multiple domains, high-quality data collection, and carefully designed and verified metrics for each task. Empirical studies show that current LLMs, including GPT-4, fail to solve the challenges presented within DevEval. Our findings offer actionable insights for the future development of LLMs toward real-world programming applications.

Anthology ID:: 2025.coling-main.502
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7511–7531
Language:
URL:: https://aclanthology.org/2025.coling-main.502/
DOI:
Bibkey:
Cite (ACL):: Bowen Li, Wenhan Wu, Ziwei Tang, Lin Shi, John Yang, Jinyang Li, Shunyu Yao, Chen Qian, Binyuan Hui, Qicheng Zhang, Zhiyin Yu, He Du, Ping Yang, Dahua Lin, Chao Peng, and Kai Chen. 2025. Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study. In Proceedings of the 31st International Conference on Computational Linguistics, pages 7511–7531, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study (Li et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.502.pdf

PDF Cite Search Fix data