DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models

Mohammadreza Pourreza; Davood Rafiei

doi:10.18653/v1/2024.findings-emnlp.481

DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models

Abstract

Leading models for the text-to-SQL task heavily rely on proprietary Large Language Models (LLMs), posing concerns over data privacy. Closing the performance gap between small open-source models and large proprietary models is crucial to mitigate this reliance. To this end, we introduce a novel two-stage fine-tuning approach that decomposes the task into two simpler tasks. Through comprehensive evaluation on three large cross-domain datasets and two small LLMs, we show that this approach improves execution accuracy by 3 to 7 percent, effectively aligning the performance of open-source models with their proprietary counterparts. Our proposed method has achieved 60.31% execution accuracy on Bird hold-out test set, which is the highest performance among methods using 7B parameter models.

Anthology ID:: 2024.findings-emnlp.481
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8212–8220
Language:
URL:: https://aclanthology.org/2024.findings-emnlp.481/
DOI:: 10.18653/v1/2024.findings-emnlp.481
Bibkey:
Cite (ACL):: Mohammadreza Pourreza and Davood Rafiei. 2024. DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 8212–8220, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models (Pourreza & Rafiei, Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-emnlp.481.pdf

PDF Cite Search Fix data