StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code

Hannah Babe; Sydney Nguyen; Yangtian Zi; Arjun Guha; Molly Feldman; Carolyn Anderson

StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code

Hannah Babe, Sydney Nguyen, Yangtian Zi, Arjun Guha, Molly Feldman, Carolyn Anderson

Abstract

Code LLMs have the potential to make it easier for non-experts to understand and write code. However, current CodeLLM benchmarks rely on a single expert-written prompt per problem, making it hard to generalize their success to non-expert users. In this paper, we present a new natural-language-to-code benchmark of prompts written by a key population of non-experts: beginning programmers. StudentEval contains 1,749 prompts written by 80 students who have only completed one introductory Python course. StudentEval contains numerous non-expert prompts describing the same problem, enabling exploration of key factors in prompt success. We use StudentEval to evaluate 12 Code LLMs and find that StudentEval is a better discriminator of model performance than existing benchmarks. Our analysis of student prompting strategies reveals that nondeterministic LLM sampling can mislead students about the quality of their descriptions, a finding with key implications for Code LLMs in education.

Anthology ID:: 2024.findings-acl.501
Volume:: Findings of the Association for Computational Linguistics ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand and virtual meeting
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8452–8474
Language:
URL:: https://aclanthology.org/2024.findings-acl.501
DOI:
Bibkey:
Cite (ACL):: Hannah Babe, Sydney Nguyen, Yangtian Zi, Arjun Guha, Molly Feldman, and Carolyn Anderson. 2024. StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code. In Findings of the Association for Computational Linguistics ACL 2024, pages 8452–8474, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):: StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code (Babe et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-acl.501.pdf

PDF Cite Search