Dynamic Augmentation Data Selection for Few-shot Text Classification

Guangliang Liu, Lifeng Jin, Owen Yuan, Jiayu Zhou


Abstract
Data augmentation has been a popular method for fine-tuning pre-trained language models to increase model robustness and performance. With augmentation data coming from modifying gold train data (in-sample augmentation) or being harvested from general domain unlabeled data (out-of-sample augmentation), the quality of such data is the key to successful fine-tuning. In this paper, we propose a dynamic data selection method to select effective augmentation data from different augmentation sources according to the model’s learning stage, by identifying a set of augmentation samples that optimally facilitates the learning process of the most current model. The method firstly filters out augmentation samples with noisy pseudo labels through a curriculum learning strategy, then estimates the effectiveness of reserved augmentation data by its influence scores on the current model at every update, allowing the data selection process tightly tailored to model parameters. And the two-stage augmentation strategy considers in-sample augmentation and out-of-sample augmentation in different learning stages. Experiments with both kinds of augmentation data on a variety of sentence classification tasks show that our method outperforms strong baselines, proving the effectiveness of our method. Analysis confirms the dynamic nature of the data effectiveness and the importance of model learning stages in utilization of augmentation data.
Anthology ID:
2022.findings-emnlp.356
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4841–4852
Language:
URL:
https://aclanthology.org/2022.findings-emnlp.356
DOI:
10.18653/v1/2022.findings-emnlp.356
Bibkey:
Cite (ACL):
Guangliang Liu, Lifeng Jin, Owen Yuan, and Jiayu Zhou. 2022. Dynamic Augmentation Data Selection for Few-shot Text Classification. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4841–4852, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Dynamic Augmentation Data Selection for Few-shot Text Classification (Liu et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-emnlp.356.pdf