Empowering Large Language Models for Textual Data Augmentation

Yichuan Li; Kaize Ding; Jianling Wang; Kyumin Lee

Empowering Large Language Models for Textual Data Augmentation

Yichuan Li, Kaize Ding, Jianling Wang, Kyumin Lee

Abstract

With the capabilities of understanding and executing natural language instructions, Large language models (LLMs) can potentially act as a powerful tool for textual data augmentation. However, the quality of augmented data depends heavily on the augmentation instructions provided, and the effectiveness can fluctuate across different downstream tasks. While manually crafting and selecting instructions can offer some improvement, this approach faces scalability and consistency issues in practice due to the diversity of downstream tasks. In this work, we address these limitations by proposing a new solution, which can automatically generate a large pool of augmentation instructions and select the most suitable task-informed instructions, thereby empowering LLMs to create high-quality augmented data for different downstream tasks. Empirically, the proposed approach consistently generates augmented data with better quality compared to non-LLM and LLM-based data augmentation methods, leading to the best performance on 26 few-shot learning tasks sourced from a wide range of application domains.

Anthology ID:: 2024.findings-acl.756
Volume:: Findings of the Association for Computational Linguistics ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand and virtual meeting
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12734–12751
Language:
URL:: https://aclanthology.org/2024.findings-acl.756
DOI:
Bibkey:
Cite (ACL):: Yichuan Li, Kaize Ding, Jianling Wang, and Kyumin Lee. 2024. Empowering Large Language Models for Textual Data Augmentation. In Findings of the Association for Computational Linguistics ACL 2024, pages 12734–12751, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):: Empowering Large Language Models for Textual Data Augmentation (Li et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-acl.756.pdf

PDF Cite Search