Bridging the Generalization Gap in Text-to-SQL Parsing with Schema Expansion

Chen Zhao, Yu Su, Adam Pauls, Emmanouil Antonios Platanios


Abstract
Text-to-SQL parsers map natural language questions to programs that are executable over tables to generate answers, and are typically evaluated on large-scale datasets like Spider (Yu et al., 2018). We argue that existing benchmarks fail to capture a certain out-of-domain generalization problem that is of significant practical importance: matching domain specific phrases to composite operation over columns. To study this problem, we first propose a synthetic dataset along with a re-purposed train/test split of the Squall dataset (Shi et al., 2020) as new benchmarks to quantify domain generalization over column operations, and find existing state-of-the-art parsers struggle in these benchmarks. We propose to address this problem by incorporating prior domain knowledge by preprocessing table schemas, and design a method that consists of two components: schema expansion and schema pruning. This method can be easily applied to multiple existing base parsers, and we show that it significantly outperforms baseline parsers on this domain generalization problem, boosting the underlying parsers’ overall performance by up to 13.8% relative accuracy gain (5.1% absolute) on the new Squall data split.
Anthology ID:
2022.acl-long.381
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5568–5578
Language:
URL:
https://aclanthology.org/2022.acl-long.381
DOI:
10.18653/v1/2022.acl-long.381
Bibkey:
Cite (ACL):
Chen Zhao, Yu Su, Adam Pauls, and Emmanouil Antonios Platanios. 2022. Bridging the Generalization Gap in Text-to-SQL Parsing with Schema Expansion. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5568–5578, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Bridging the Generalization Gap in Text-to-SQL Parsing with Schema Expansion (Zhao et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.381.pdf