Siyuan Wu
2024
NL2Formula: Generating Spreadsheet Formulas from Natural Language Queries
Wei Zhao
|
Zhitao Hou
|
Siyuan Wu
|
Yan Gao
|
Haoyu Dong
|
Yao Wan
|
Hongyu Zhang
|
Yulei Sui
|
Haidong Zhang
Findings of the Association for Computational Linguistics: EACL 2024
Writing formulas on spreadsheets, such as Microsoft Excel and Google Sheets, is a widespread practice among users performing data analysis. However, crafting formulas on spreadsheets remains a tedious and error-prone task for many end-users, particularly when dealing with complex operations. To alleviate the burden associated with writing spreadsheet formulas, this paper introduces a novel benchmark task called NL2Formula, with the aim to generate executable formulas that are grounded on a spreadsheet table, given a Natural Language (NL) query as input. To accomplish this, we construct a comprehensive dataset consisting of 70,799 paired NL queries and corresponding spreadsheet formulas, covering 21,670 tables and 37 types of formula functions. We realize the NL2Formula task by providing a sequence-to-sequence baseline implementation called fCoder. Experimental results validate the effectiveness of fCoder, demonstrating its superior performance compared to the baseline models. Furthermore, we also compare fCoder with an initial GPT-3.5 model (i.e., text-davinci-003). Lastly, through in-depth error analysis, we identify potential challenges in the NL2Formula task and advocate for further investigation.
Search
Co-authors
- Wei Zhao 1
- Zhitao Hou 1
- Yan Gao 1
- Haoyu Dong 1
- Yao Wan 1
- show all...