Weakly Supervised Text-to-SQL Parsing through Question Decomposition

Tomer Wolfson, Daniel Deutch, Jonathan Berant


Abstract
Text-to-SQL parsers are crucial in enabling non-experts to effortlessly query relational data. Training such parsers, by contrast, generally requires expertise in annotating natural language (NL) utterances with corresponding SQL queries. In this work, we propose a weak supervision approach for training text-to-SQL parsers. We take advantage of the recently proposed question meaning representation called QDMR, an intermediate between NL and formal query languages. Given questions, their QDMR structures (annotated by non-experts or automatically predicted), and the answers, we are able to automatically synthesize SQL queries that are used to train text-to-SQL models. We test our approach by experimenting on five benchmark datasets. Our results show that the weakly supervised models perform competitively with those trained on annotated NL-SQL data. Overall, we effectively train text-to-SQL parsers, while using zero SQL annotations.
Anthology ID:
2022.findings-naacl.193
Volume:
Findings of the Association for Computational Linguistics: NAACL 2022
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2528–2542
Language:
URL:
https://aclanthology.org/2022.findings-naacl.193
DOI:
10.18653/v1/2022.findings-naacl.193
Bibkey:
Cite (ACL):
Tomer Wolfson, Daniel Deutch, and Jonathan Berant. 2022. Weakly Supervised Text-to-SQL Parsing through Question Decomposition. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2528–2542, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Weakly Supervised Text-to-SQL Parsing through Question Decomposition (Wolfson et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-naacl.193.pdf
Video:
 https://aclanthology.org/2022.findings-naacl.193.mp4
Code
 tomerwolgithub/question-decomposition-to-sql