FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning

Jing Zhou, Yanan Zheng, Jie Tang, Li Jian, Zhilin Yang


Abstract
Most previous methods for text data augmentation are limited to simple tasks and weak baselines. We explore data augmentation on hard tasks (i.e., few-shot natural language understanding) and strong baselines (i.e., pretrained models with over one billion parameters). Under this setting, we reproduced a large number of previous augmentation methods and found that these methods bring marginal gains at best and sometimes degrade the performance much. To address this challenge, we propose a novel data augmentation method FlipDA that jointly uses a generative model and a classifier to generate label-flipped data. Central to the idea of FlipDA is the discovery that generating label-flipped data is more crucial to the performance than generating label-preserved data. Experiments show that FlipDA achieves a good tradeoff between effectiveness and robustness—it substantially improves many tasks while not negatively affecting the others.
Anthology ID:
2022.acl-long.592
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8646–8665
Language:
URL:
https://aclanthology.org/2022.acl-long.592
DOI:
10.18653/v1/2022.acl-long.592
Bibkey:
Cite (ACL):
Jing Zhou, Yanan Zheng, Jie Tang, Li Jian, and Zhilin Yang. 2022. FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8646–8665, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning (Zhou et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.592.pdf
Software:
 2022.acl-long.592.software.zip
Code
 zhouj8553/flipda
Data
BoolQCOPAMultiRCReCoRDSuperGLUEWSCWiC