Mansi Uniyal
2024
One-to-many testing for code generation from (just) natural language
Mansi Uniyal
|
Mukul Singh
|
Gust Verbruggen
|
Sumit Gulwani
|
Vu Le
Findings of the Association for Computational Linguistics: EMNLP 2024
MBPP is a popular dataset for evaluating the task of code generation from natural language. Despite its popularity, there are three problems: (1) it relies on providing test cases to generate the right signature, (2) there is poor alignment between instruction and evaluation test cases, and (3) contamination of the exact phrasing being present in training datasets. We adapt MBPP to emphasize on generating code from just natural language by (1) removing ambiguity about the semantics of the task from the descriptions, and (2) evaluating generated code on multiple sets of assertions to account for ambiguity in the syntax. We compare popular open and closed weight models on the original (MBPP) and adapted (MBUPP) datasets.