Boosting Text Augmentation via Hybrid Instance Filtering Framework

Heng Yang; Ke Li

doi:10.18653/v1/2023.findings-acl.105

Boosting Text Augmentation via Hybrid Instance Filtering Framework

Abstract

Text augmentation is an effective technique for addressing the problem of insufficient data in natural language processing. However, existing text augmentation methods tend to focus on few-shot scenarios and usually perform poorly on large public datasets. Our research indicates that existing augmentation methods often generate instances with shifted feature spaces, which leads to a drop in performance on the augmented data (for example, EDA generally loses approximately 2% in aspect-based sentiment classification). To address this problem, we propose a hybrid instance-filtering framework (BoostAug) based on pre-trained language models that can maintain a similar feature space with natural datasets. BoostAug is transferable to existing text augmentation methods (such as synonym substitution and back translation) and significantly improves the augmentation performance by 2-3% in classification accuracy. Our experimental results on three classification tasks and nine public datasets show that BoostAug addresses the performance drop problem and outperforms state-of-the-art text augmentation methods. Additionally, we release the code to help improve existing augmentation methods on large datasets.

Anthology ID:: 2023.findings-acl.105
Volume:: Findings of the Association for Computational Linguistics: ACL 2023
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1652–1669
Language:
URL:: https://aclanthology.org/2023.findings-acl.105/
DOI:: 10.18653/v1/2023.findings-acl.105
Bibkey:
Cite (ACL):: Heng Yang and Ke Li. 2023. Boosting Text Augmentation via Hybrid Instance Filtering Framework. In Findings of the Association for Computational Linguistics: ACL 2023, pages 1652–1669, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Boosting Text Augmentation via Hybrid Instance Filtering Framework (Yang & Li, Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-acl.105.pdf

PDF Cite Search Fix data