Mixture of Soft Prompts for Controllable Data Generation

Derek Chen, Celine Lee, Yunan Lu, Domenic Rosati, Zhou Yu


Abstract
Large language models (LLMs) effectively generate fluent text when the target output follows natural language patterns. However, structured prediction tasks confine the output format to a limited ontology, causing even very large models to struggle since they were never trained with such restrictions in mind. The difficulty of using LLMs for direct prediction is exacerbated in few-shot learning scenarios, which commonly arise due to domain shift and resource limitations. We flip the problem on its head by leveraging the LLM as a tool for data augmentation rather than direct prediction. Our proposed Mixture of Soft Prompts (MSP) serves as a parameter-efficient procedure for generating multi-attribute data in a controlled manner. Denoising mechanisms are further applied to improve the quality of synthesized data. Automatic metrics show our method is capable of producing diverse and natural text, while preserving label semantics. Moreover, MSP achieves state-of-the-art results on three benchmarks when compared against strong baselines. Our method offers an alternate data-centric approach for applying LLMs to complex prediction tasks.
Anthology ID:
2023.findings-emnlp.988
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14815–14833
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.988
DOI:
10.18653/v1/2023.findings-emnlp.988
Bibkey:
Cite (ACL):
Derek Chen, Celine Lee, Yunan Lu, Domenic Rosati, and Zhou Yu. 2023. Mixture of Soft Prompts for Controllable Data Generation. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 14815–14833, Singapore. Association for Computational Linguistics.
Cite (Informal):
Mixture of Soft Prompts for Controllable Data Generation (Chen et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.988.pdf