PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks

Yufei Wang, Can Xu, Qingfeng Sun, Huang Hu, Chongyang Tao, Xiubo Geng, Daxin Jiang


Abstract
This paper focuses on the Data Augmentation for low-resource Natural Language Understanding (NLU) tasks. We propose Prompt-based Data Augmentation model (PromDA) which only trains small-scale Soft Prompt (i.e., a set of trainable vectors) in the frozen Pre-trained Language Models (PLMs). This avoids human effort in collecting unlabeled in-domain data and maintains the quality of generated synthetic data. In addition, PromDA generates synthetic data via two different views and filters out the low-quality data using NLU models. Experiments on four benchmarks show that synthetic data produced by PromDA successfully boost up the performance of NLU models which consistently outperform several competitive baseline models, including a state-of-the-art semi-supervised model using unlabeled in-domain data. The synthetic data from PromDA are also complementary with unlabeled in-domain data. The NLU models can be further improved when they are combined for training.
Anthology ID:
2022.acl-long.292
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4242–4255
Language:
URL:
https://aclanthology.org/2022.acl-long.292
DOI:
10.18653/v1/2022.acl-long.292
Bibkey:
Cite (ACL):
Yufei Wang, Can Xu, Qingfeng Sun, Huang Hu, Chongyang Tao, Xiubo Geng, and Daxin Jiang. 2022. PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4242–4255, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks (Wang et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.292.pdf
Software:
 2022.acl-long.292.software.zip
Code
 garyyufei/promda
Data
CoNLL 2003SSTSST-2