Reinforcement Learning with Token-level Feedback for Controllable Text Generation

Wendi Li; Wei Wei; Kaihe Xu; Wenfeng Xie; Dangyang Chen; Yu Cheng

doi:10.18653/v1/2024.findings-naacl.111

Reinforcement Learning with Token-level Feedback for Controllable Text Generation

Wendi Li, Wei Wei, Kaihe Xu, Wenfeng Xie, Dangyang Chen, Yu Cheng

Abstract

To meet the requirements of real-world applications, it is essential to control generations of large language models (LLMs). Prior research has tried to introduce reinforcement learning (RL) into controllable text generation while most existing methods suffer from overfitting issues (finetuning-based methods) or semantic collapse (post-processing methods). However, current RL methods are generally guided by coarse-grained (sentence/paragraph-level) feedback, which may lead to suboptimal performance owing to semantic twists or progressions within sentences. To tackle that, we propose a novel reinforcement learning algorithm named TOLE which formulates TOken-LEvel rewards for controllable text generation, and employs a “first-quantize-then-noise” paradigm to enhance the robustness of the RL algorithm. Furthermore, TOLE can be flexibly extended to multiple constraints with little computational expense. Experimental results show that our algorithm can achieve superior performance on both single-attribute and multi-attribute control tasks. We have released our codes at https://github.com/WindyLee0822/CTG.

Anthology ID:: 2024.findings-naacl.111
Volume:: Findings of the Association for Computational Linguistics: NAACL 2024
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Kevin Duh, Helena Gomez, Steven Bethard
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1704–1719
Language:
URL:: https://aclanthology.org/2024.findings-naacl.111
DOI:: 10.18653/v1/2024.findings-naacl.111
Bibkey:
Cite (ACL):: Wendi Li, Wei Wei, Kaihe Xu, Wenfeng Xie, Dangyang Chen, and Yu Cheng. 2024. Reinforcement Learning with Token-level Feedback for Controllable Text Generation. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 1704–1719, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: Reinforcement Learning with Token-level Feedback for Controllable Text Generation (Li et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-naacl.111.pdf

PDF Cite Search