CLASP: Few-Shot Cross-Lingual Data Augmentation for Semantic Parsing

Andy Rosenbaum, Saleh Soltan, Wael Hamza, Marco Damonte, Isabel Groves, Amir Saffari


Abstract
A bottleneck to developing Semantic Parsing (SP) models is the need for a large volume of human-labeled training data. Given the complexity and cost of human annotation for SP, labeled data is often scarce, particularly in multilingual settings. Large Language Models (LLMs) excel at SP given only a few examples, however LLMs are unsuitable for runtime systems which require low latency. In this work, we propose CLASP, a simple method to improve low-resource SP for moderate-sized models: we generate synthetic data from AlexaTM 20B to augment the training set for a model 40x smaller (500M parameters). We evaluate on two datasets in low-resource settings: English PIZZA, containing either 348 or 16 real examples, and mTOP cross-lingual zero-shot, where training data is available only in English, and the model must generalize to four new languages. On both datasets, we show significant improvements over strong baseline methods.
Anthology ID:
2022.aacl-short.56
Volume:
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Month:
November
Year:
2022
Address:
Online only
Editors:
Yulan He, Heng Ji, Sujian Li, Yang Liu, Chua-Hui Chang
Venues:
AACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
444–462
Language:
URL:
https://aclanthology.org/2022.aacl-short.56
DOI:
Bibkey:
Cite (ACL):
Andy Rosenbaum, Saleh Soltan, Wael Hamza, Marco Damonte, Isabel Groves, and Amir Saffari. 2022. CLASP: Few-Shot Cross-Lingual Data Augmentation for Semantic Parsing. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 444–462, Online only. Association for Computational Linguistics.
Cite (Informal):
CLASP: Few-Shot Cross-Lingual Data Augmentation for Semantic Parsing (Rosenbaum et al., AACL-IJCNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.aacl-short.56.pdf