Testing Cross-Database Semantic Parsers With Canonical Utterances

Heather Lent, Semih Yavuz, Tao Yu, Tong Niu, Yingbo Zhou, Dragomir Radev, Xi Victoria Lin


Abstract
The benchmark performance of cross-database semantic parsing has climbed steadily in recent years, catalyzed by the wide adoption of pre-trained language models. Yet existing work have shown that state-of-the-art cross-database semantic parsers struggle to generalize to novel user utterances, databases and query structures. To obtain transparent details on the strengths and limitation of these models, we propose a diagnostic testing approach based on controlled synthesis of canonical natural language and SQL pairs. Inspired by the CheckList, we characterize a set of essential capabilities for cross-database semantic parsing models, and detailed the method for synthesizing the corresponding test data. We evaluated a variety of high performing models using the proposed approach, and identified several non-obvious weaknesses across models (e.g. unable to correctly select many columns). Our dataset and code are released as a test suite at http://github.com/hclent/BehaviorCheckingSemPar.
Anthology ID:
2021.eval4nlp-1.8
Volume:
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Yang Gao, Steffen Eger, Wei Zhao, Piyawat Lertvittayakumjorn, Marina Fomicheva
Venue:
Eval4NLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
73–83
Language:
URL:
https://aclanthology.org/2021.eval4nlp-1.8
DOI:
10.18653/v1/2021.eval4nlp-1.8
Bibkey:
Cite (ACL):
Heather Lent, Semih Yavuz, Tao Yu, Tong Niu, Yingbo Zhou, Dragomir Radev, and Xi Victoria Lin. 2021. Testing Cross-Database Semantic Parsers With Canonical Utterances. In Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems, pages 73–83, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Testing Cross-Database Semantic Parsers With Canonical Utterances (Lent et al., Eval4NLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.eval4nlp-1.8.pdf
Code
 hclent/behaviorcheckingsempar