ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval

Yue Yu, Yuchen Zhuang, Rongzhi Zhang, Yu Meng, Jiaming Shen, Chao Zhang


Abstract
With the development of large language models (LLMs), zero-shot learning has attracted much attention for various NLP tasks. Different from prior works that generate training data with billion-scale natural language generation (NLG) models, we propose a retrieval-enhanced framework to create training data from a general-domain unlabeled corpus. To realize this, we first conduct contrastive pretraining to learn an unsupervised dense retriever for extracting the most relevant documents using class-descriptive verbalizers. We then further pro- pose two simple strategies, namely Verbalizer Augmentation with Demonstrations and Self- consistency Guided Filtering to improve the topic coverage of the dataset while removing noisy examples. Experiments on nine datasets demonstrate that ReGen achieves 4.3% gain over the strongest baselines and saves around 70% of the time when compared with baselines using large NLG models. Besides, REGEN can be naturally integrated with recently proposed large language models to boost performance.
Anthology ID:
2023.findings-acl.748
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11782–11805
Language:
URL:
https://aclanthology.org/2023.findings-acl.748
DOI:
10.18653/v1/2023.findings-acl.748
Bibkey:
Cite (ACL):
Yue Yu, Yuchen Zhuang, Rongzhi Zhang, Yu Meng, Jiaming Shen, and Chao Zhang. 2023. ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval. In Findings of the Association for Computational Linguistics: ACL 2023, pages 11782–11805, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval (Yu et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.748.pdf
Video:
 https://aclanthology.org/2023.findings-acl.748.mp4