T5QL: Taming language models for SQL generation

Samuel David Arcadinho; David Aparicio; Hugo Veiga; Antonio Alegria

doi:10.18653/v1/2022.gem-1.23

T5QL: Taming language models for SQL generation

Samuel David Arcadinho, David Aparicio, Hugo Veiga, Antonio Alegria

Abstract

Automatic SQL generation has been an active research area, aiming at streamlining the access to databases by writing natural language with the given intent instead of writing SQL. Current SOTA methods for semantic parsing depend on LLMs to achieve high predictive accuracy on benchmark datasets. This reduces their applicability, since LLMs requires expensive GPUs. Furthermore, SOTA methods are ungrounded and thus not guaranteed to always generate valid SQL. Here we propose T5QL, a new SQL generation method that improves the performance in benchmark datasets when using smaller LMs, namely T5-Base, by 13pp when compared against SOTA methods. Additionally, T5QL is guaranteed to always output valid SQL using a context-free grammar to constrain SQL generation. Finally, we show that dividing semantic parsing in two tasks, candidate SQLs generation and candidate re-ranking, is a promising research avenue that can reduce the need for large LMs.

Anthology ID:: 2022.gem-1.23
Volume:: Proceedings of the Second Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates (Hybrid)
Editors:: Antoine Bosselut, Khyathi Chandu, Kaustubh Dhole, Varun Gangal, Sebastian Gehrmann, Yacine Jernite, Jekaterina Novikova, Laura Perez-Beltrachini
Venue:: GEM
SIG:: SIGGEN
Publisher:: Association for Computational Linguistics
Note:
Pages:: 276–286
Language:
URL:: https://aclanthology.org/2022.gem-1.23/
DOI:: 10.18653/v1/2022.gem-1.23
Bibkey:
Cite (ACL):: Samuel David Arcadinho, David Aparicio, Hugo Veiga, and Antonio Alegria. 2022. T5QL: Taming language models for SQL generation. In Proceedings of the Second Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), pages 276–286, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):: T5QL: Taming language models for SQL generation (Arcadinho et al., GEM 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.gem-1.23.pdf
Video:: https://aclanthology.org/2022.gem-1.23.mp4

PDF Cite Search Video Fix data