Amit Sah
2022
DeepADA:An Attention-Based Deep Learning Framework for Augmenting Imbalanced Textual Datasets
Amit Sah
|
Muhammad Abulaish
Proceedings of the 19th International Conference on Natural Language Processing (ICON)
In this paper, we present an attention-based deep learning framework, DeepADA, which uses data augmentation to address the class imbalance problem in textual datasets. The proposed framework carries out the following functions:(i) using MPNET-based embeddings to extract keywords out of documents from the minority class, (ii) making use of a CNN-BiLSTM architecture with parallel attention to learn the important contextual words associated with the minority class documents’ keywords and provide them with word-level characteristics derived from their statistical and semantic features, (iii) using MPNET, replacing the key contextual terms derived from the oversampled documents that match to a keyword with the contextual term that best fits the context, and finally (iv) oversampling the minority class dataset to produce a balanced dataset. Using a 2-layer stacked BiLSTM classifier, we assess the efficacy of the proposed framework using the original and oversampled versions of three Amazon’s reviews datasets. We contrast the proposed data augmentation approach with two state-of-the-art text data augmentation methods. The experimental results reveal that our method produces an oversampled dataset that is more useful and helps the classifier perform better than the other two state-of-the-art methods. Nevertheless, we discover that the oversampled datasets outperformed their original ones by a wide margin.