STA: Self-controlled Text Augmentation for Improving Text Classifications

Congcong Wang, Gonzalo Fiz Pontiveros, Steven Derby, Tri Kurniawan Wijaya


Abstract
Despite recent advancements in Machine Learning, many tasks still involve working in low-data regimes which can make solving natural language problems difficult. Recently, a number of text augmentation techniques have emerged in the field of Natural Language Processing (NLP) which can enrich the training data with new examples, though they are not without their caveats. For instance, simple rule-based heuristic methods are effective, but lack variation in semantic content and syntactic structure with respect to the original text. On the other hand, more complex deep learning approaches can cause extreme shifts in the intrinsic meaning of the text and introduce unwanted noise into the training data. To more reliably control the quality of the augmented examples, we introduce a state-of-the-art approach for Self-Controlled Text Augmentation (STA). Our approach tightly controls the generation process by introducing a self-checking procedure to ensure that generated examples retain the semantic content of the original text. Experimental results on multiple benchmarking datasets demonstrate that STA substantially outperforms existing state-of-the-art techniques, whilst qualitative analysis reveals that the generated examples are both lexically diverse and semantically reliable.
Anthology ID:
2024.ecnlp-1.11
Volume:
Proceedings of the Seventh Workshop on e-Commerce and NLP @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Shervin Malmasi, Besnik Fetahu, Nicola Ueffing, Oleg Rokhlenko, Eugene Agichtein, Ido Guy
Venues:
ECNLP | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
97–114
Language:
URL:
https://aclanthology.org/2024.ecnlp-1.11
DOI:
Bibkey:
Cite (ACL):
Congcong Wang, Gonzalo Fiz Pontiveros, Steven Derby, and Tri Kurniawan Wijaya. 2024. STA: Self-controlled Text Augmentation for Improving Text Classifications. In Proceedings of the Seventh Workshop on e-Commerce and NLP @ LREC-COLING 2024, pages 97–114, Torino, Italia. ELRA and ICCL.
Cite (Informal):
STA: Self-controlled Text Augmentation for Improving Text Classifications (Wang et al., ECNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.ecnlp-1.11.pdf