The Program Testing Ability of Large Language Models for Code

Weimin Xiong, Yiwen Guo, Hao Chen


Abstract
Recent development of large language models (LLMs) for code like CodeX and CodeT5+ shows promise in achieving code intelligence. Their ability of synthesizing program targeting a pre-defined algorithmic coding task has been intensively tested and verified on datasets including HumanEval and MBPP. Yet, evaluation of these LLMs from more perspectives (than just program synthesis) is also anticipated, considering their broad scope of applications. In this paper, we explore their ability of automatic test cases generation. We show intriguing observations and reveal how the quality of their generated test cases can be improved. Following recent work which uses generated test cases to enhance program synthesis, we further leverage our findings in improving the quality of the synthesized programs and show +11.77% and +4.22% higher code pass rates on HumanEval+ comparing with the GPT-3.5-turbo baseline and the recent state-of-the-art, respectively. Our code is publicly available at https://github.com/asdasxzxcq/TestCaseGen.
Anthology ID:
2024.emnlp-industry.3
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
November
Year:
2024
Address:
Miami, Florida, US
Editors:
Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
23–34
Language:
URL:
https://aclanthology.org/2024.emnlp-industry.3
DOI:
Bibkey:
Cite (ACL):
Weimin Xiong, Yiwen Guo, and Hao Chen. 2024. The Program Testing Ability of Large Language Models for Code. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 23–34, Miami, Florida, US. Association for Computational Linguistics.
Cite (Informal):
The Program Testing Ability of Large Language Models for Code (Xiong et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-industry.3.pdf