SpreadNaLa: A Naturalistic Code Generation Evaluation Dataset of Spreadsheet Formulas

Sebastian Schuster, Ayesha Ansar, Om Agarwal, Vera Demberg


Abstract
Automatic generation of code from natural language descriptions has emerged as one of the main use cases of large language models (LLMs). This has also led to a proliferation of datasets to track progress in the reliability of code generation models, including domains such as programming challenges and common data science tasks. However, existing datasets primarily target the use of code generation models to aid expert programmers in writing code. In this work, we consider a domain of code generation which is more frequently used by users without sophisticated programming skills: translating English descriptions to spreadsheet formulas that can be used to do everyday data processing tasks. We extract naturalistic instructions from StackOverflow posts and manually verify and standardize the corresponding spreadsheet formulas. We use this dataset to evaluate an off-the-shelf code generation model (GPT 3.5 text-davinci-003) as well as recently proposed pragmatic code generation procedures and find that Code Reviewer reranking (Zhang et al., 2022) performs best among the evaluated methods but still frequently generates formulas that differ from human-generated ones.
Anthology ID:
2024.lrec-main.1323
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
15216–15225
Language:
URL:
https://aclanthology.org/2024.lrec-main.1323
DOI:
Bibkey:
Cite (ACL):
Sebastian Schuster, Ayesha Ansar, Om Agarwal, and Vera Demberg. 2024. SpreadNaLa: A Naturalistic Code Generation Evaluation Dataset of Spreadsheet Formulas. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 15216–15225, Torino, Italia. ELRA and ICCL.
Cite (Informal):
SpreadNaLa: A Naturalistic Code Generation Evaluation Dataset of Spreadsheet Formulas (Schuster et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1323.pdf