John Yang
2025
Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study
Bowen Li
|
Wenhan Wu
|
Ziwei Tang
|
Lin Shi
|
John Yang
|
Jinyang Li
|
Shunyu Yao
|
Chen Qian
|
Binyuan Hui
|
Qicheng Zhang
|
Zhiyin Yu
|
He Du
|
Ping Yang
|
Dahua Lin
|
Chao Peng
|
Kai Chen
Proceedings of the 31st International Conference on Computational Linguistics
Recent advancements in large language models (LLMs) have significantly enhanced their coding capabilities. However, existing benchmarks predominantly focused on simplified or isolated aspects of coding, such as single-file code generation or repository issue debugging, falling short of measuring the full spectrum of challenges raised by real-world programming activities. In this case study, we explore the performance of LLMs across the entire software development lifecycle with DevEval, encompassing stages including software design, environment setup, implementation, acceptance testing, and unit testing. DevEval features four programming languages, multiple domains, high-quality data collection, and carefully designed and verified metrics for each task. Empirical studies show that current LLMs, including GPT-4, fail to solve the challenges presented within DevEval. Our findings offer actionable insights for the future development of LLMs toward real-world programming applications.
2024
Referral Augmentation for Zero-Shot Information Retrieval
Michael Tang
|
Shunyu Yao
|
John Yang
|
Karthik Narasimhan
Findings of the Association for Computational Linguistics: ACL 2024
We propose Referral-Augmented Retrieval (RAR), a simple technique that concatenates document indices with referrals: text from other documents that cite or link to the given document. We find that RAR provides significant performance gains for tasks across paper retrieval, entity retrieval, and open-domain question-answering in both zero-shot and in-domain (e.g., fine-tuned) settings. We examine how RAR provides especially strong improvements on more structured tasks, and can greatly outperform generative text expansion techniques such as DocT5Query and Query2Doc, with a 37% and 21% absolute improvement on ACL paper retrieval, respectively. We also compare three ways to aggregate referrals for RAR. Overall, we believe RAR can help revive and re-contextualize the classic information retrieval idea of using anchor texts to improve the representations of documents in a wide variety of corpuses in the age of neural retrieval.