DeepADA:An Attention-Based Deep Learning Framework for Augmenting Imbalanced Textual Datasets

Amit Sah, Muhammad Abulaish


Abstract
In this paper, we present an attention-based deep learning framework, DeepADA, which uses data augmentation to address the class imbalance problem in textual datasets. The proposed framework carries out the following functions:(i) using MPNET-based embeddings to extract keywords out of documents from the minority class, (ii) making use of a CNN-BiLSTM architecture with parallel attention to learn the important contextual words associated with the minority class documents’ keywords and provide them with word-level characteristics derived from their statistical and semantic features, (iii) using MPNET, replacing the key contextual terms derived from the oversampled documents that match to a keyword with the contextual term that best fits the context, and finally (iv) oversampling the minority class dataset to produce a balanced dataset. Using a 2-layer stacked BiLSTM classifier, we assess the efficacy of the proposed framework using the original and oversampled versions of three Amazon’s reviews datasets. We contrast the proposed data augmentation approach with two state-of-the-art text data augmentation methods. The experimental results reveal that our method produces an oversampled dataset that is more useful and helps the classifier perform better than the other two state-of-the-art methods. Nevertheless, we discover that the oversampled datasets outperformed their original ones by a wide margin.
Anthology ID:
2022.icon-main.38
Volume:
Proceedings of the 19th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2022
Address:
New Delhi, India
Editors:
Md. Shad Akhtar, Tanmoy Chakraborty
Venue:
ICON
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
318–327
Language:
URL:
https://aclanthology.org/2022.icon-main.38
DOI:
Bibkey:
Cite (ACL):
Amit Sah and Muhammad Abulaish. 2022. DeepADA:An Attention-Based Deep Learning Framework for Augmenting Imbalanced Textual Datasets. In Proceedings of the 19th International Conference on Natural Language Processing (ICON), pages 318–327, New Delhi, India. Association for Computational Linguistics.
Cite (Informal):
DeepADA:An Attention-Based Deep Learning Framework for Augmenting Imbalanced Textual Datasets (Sah & Abulaish, ICON 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.icon-main.38.pdf