Learning Data Augmentation Schedules for Natural Language Processing

Daphné Chopard, Matthias S. Treder, Irena Spasić


Abstract
Despite its proven efficiency in other fields, data augmentation is less popular in the context of natural language processing (NLP) due to its complexity and limited results. A recent study (Longpre et al., 2020) showed for example that task-agnostic data augmentations fail to consistently boost the performance of pretrained transformers even in low data regimes. In this paper, we investigate whether data-driven augmentation scheduling and the integration of a wider set of transformations can lead to improved performance where fixed and limited policies were unsuccessful. Our results suggest that, while this approach can help the training process in some settings, the improvements are unsubstantial. This negative result is meant to help researchers better understand the limitations of data augmentation for NLP.
Anthology ID:
2021.insights-1.14
Volume:
Proceedings of the Second Workshop on Insights from Negative Results in NLP
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
João Sedoc, Anna Rogers, Anna Rumshisky, Shabnam Tafreshi
Venue:
insights
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
89–102
Language:
URL:
https://aclanthology.org/2021.insights-1.14
DOI:
10.18653/v1/2021.insights-1.14
Bibkey:
Cite (ACL):
Daphné Chopard, Matthias S. Treder, and Irena Spasić. 2021. Learning Data Augmentation Schedules for Natural Language Processing. In Proceedings of the Second Workshop on Insights from Negative Results in NLP, pages 89–102, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Learning Data Augmentation Schedules for Natural Language Processing (Chopard et al., insights 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.insights-1.14.pdf
Video:
 https://aclanthology.org/2021.insights-1.14.mp4
Code
 chopardda/ldas-nlp
Data
MultiNLISSTSST-2