Natural SQL: Making SQL Easier to Infer from Natural Language Specifications

Yujian Gan, Xinyun Chen, Jinxia Xie, Matthew Purver, John R. Woodward, John Drake, Qiaofu Zhang


Abstract
Addressing the mismatch between natural language descriptions and the corresponding SQL queries is a key challenge for text-to-SQL translation. To bridge this gap, we propose an SQL intermediate representation (IR) called Natural SQL (NatSQL). Specifically, NatSQL preserves the core functionalities of SQL, while it simplifies the queries as follows: (1) dispensing with operators and keywords such as GROUP BY, HAVING, FROM, JOIN ON, which are usually hard to find counterparts in the text descriptions; (2) removing the need of nested subqueries and set operators; and (3) making the schema linking easier by reducing the required number of schema items. On Spider, a challenging text-to-SQL benchmark that contains complex and nested SQL queries, we demonstrate that NatSQL outperforms other IRs, and significantly improves the performance of several previous SOTA models. Furthermore, for existing models that do not support executable SQL generation, NatSQL easily enables them to generate executable SQL queries, and achieves the new state-of-the-art execution accuracy.
Anthology ID:
2021.findings-emnlp.174
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2030–2042
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.174
DOI:
10.18653/v1/2021.findings-emnlp.174
Bibkey:
Cite (ACL):
Yujian Gan, Xinyun Chen, Jinxia Xie, Matthew Purver, John R. Woodward, John Drake, and Qiaofu Zhang. 2021. Natural SQL: Making SQL Easier to Infer from Natural Language Specifications. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2030–2042, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Natural SQL: Making SQL Easier to Infer from Natural Language Specifications (Gan et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.174.pdf
Video:
 https://aclanthology.org/2021.findings-emnlp.174.mp4
Code
 ygan/natsql +  additional community code
Data
WikiSQL