ReFSQL: A Retrieval-Augmentation Framework for Text-to-SQL Generation

Kun Zhang; Xiexiong Lin; Yuanzhuo Wang; Xin Zhang; Fei Sun; Cen Jianhe; Hexiang Tan; Xuhui Jiang; Huawei Shen (沈华伟)

doi:10.18653/v1/2023.findings-emnlp.48

ReFSQL: A Retrieval-Augmentation Framework for Text-to-SQL Generation

Kun Zhang, Xiexiong Lin, Yuanzhuo Wang, Xin Zhang, Fei Sun, Cen Jianhe, Hexiang Tan, Xuhui Jiang, Huawei Shen

Abstract

Text-to-SQL is the task that aims at translating natural language questions into SQL queries. Existing methods directly align the natural language with SQL Language and train one encoder-decoder-based model to fit all questions. However, they underestimate the inherent structural characteristics of SQL, as well as the gap between specific structure knowledge and general knowledge. This leads to structure errors in the generated SQL. To address the above challenges, we propose a retrieval-argument framework, namely ReFSQL. It contains two parts, structure-enhanced retriever and the generator. Structure-enhanced retriever is designed to identify samples with comparable specific knowledge in an unsupervised way. Subsequently, we incorporate the retrieved samples’ SQL into the input, enabling the model to acquire prior knowledge of similar SQL grammar. To further bridge the gap between specific and general knowledge, we present a mahalanobis contrastive learning method, which facilitates the transfer of the sample toward the specific knowledge distribution constructed by the retrieved samples. Experimental results on five datasets verify the effectiveness of our approach in improving the accuracy and robustness of Text-to-SQL generation. Our framework has achieved improved performance when combined with many other backbone models (including the 11B flan-T5) and also achieved state-of-the-art performance when compared to existing methods that employ the fine-tuning approach.

Anthology ID:: 2023.findings-emnlp.48
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2023
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 664–673
Language:
URL:: https://aclanthology.org/2023.findings-emnlp.48/
DOI:: 10.18653/v1/2023.findings-emnlp.48
Bibkey:
Cite (ACL):: Kun Zhang, Xiexiong Lin, Yuanzhuo Wang, Xin Zhang, Fei Sun, Cen Jianhe, Hexiang Tan, Xuhui Jiang, and Huawei Shen. 2023. ReFSQL: A Retrieval-Augmentation Framework for Text-to-SQL Generation. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 664–673, Singapore. Association for Computational Linguistics.
Cite (Informal):: ReFSQL: A Retrieval-Augmentation Framework for Text-to-SQL Generation (Zhang et al., Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-emnlp.48.pdf

PDF Cite Search Fix data