DuNST: Dual Noisy Self Training for Semi-Supervised Controllable Text Generation

Yuxi Feng, Xiaoyuan Yi, Xiting Wang, Laks Lakshmanan, V.S., Xing Xie


Abstract
Self-training (ST) has prospered again in language understanding by augmenting the fine-tuning of big pre-trained models when labeled data is insufficient. However, it remains challenging to incorporate ST into attribute-controllable language generation. Augmented only by self-generated pseudo text, generation models over-exploit the previously learned text space and fail to explore a larger one, suffering from a restricted generalization boundary and limited controllability. In this work, we propose DuNST, a novel ST framework to tackle these problems. DuNST jointly models text generation and classification as a dual process and further perturbs and escapes from the collapsed space by adding two kinds of flexible noise. In this way, our model could construct and utilize both pseudo text generated from given labels and pseudo labels predicted from available unlabeled text, which are gradually refined during the ST phase. We theoretically demonstrate that DuNST can be regarded as enhancing the exploration of the potentially larger real text space while maintaining exploitation, guaranteeing improved performance. Experiments on three controllable generation tasks show that DuNST significantly boosts control accuracy with comparable generation fluency and diversity against several strong baselines.
Anthology ID:
2023.acl-long.488
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8760–8785
Language:
URL:
https://aclanthology.org/2023.acl-long.488
DOI:
10.18653/v1/2023.acl-long.488
Bibkey:
Cite (ACL):
Yuxi Feng, Xiaoyuan Yi, Xiting Wang, Laks Lakshmanan, V.S., and Xing Xie. 2023. DuNST: Dual Noisy Self Training for Semi-Supervised Controllable Text Generation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8760–8785, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
DuNST: Dual Noisy Self Training for Semi-Supervised Controllable Text Generation (Feng et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-long.488.pdf
Video:
 https://aclanthology.org/2023.acl-long.488.mp4