Learning to Rank Generation with Pairwise Partial Rewards

Youngwon Lee; Jinu Lee; Seung-won Hwang

doi:10.18653/v1/2023.emnlp-main.371

Learning to Rank Generation with Pairwise Partial Rewards

Abstract

This paper studies the use of reinforcement learning for conditional text generation, which overcomes the limitation of the prevalent supervised maximum likelihood estimation approach. However, it still suffers from challenges including the large action space and the delayed reward, as the reward can be computed only after an entire sequence is generated. To address these challenges, we propose a method that provides partial rewards for intermediate actions taken on partial sequences. This enables the model to promptly prioritize actions that lead to the generation of more desirable sequences. Our method’s key contribution lies in its focus on distinguishing relatively more desirable actions rather than striving to precisely estimate pointwise values for arbitrary partial sequences. Instead, our model learns to discern the relative desirability between pairs of actions, or rank actions in a pairwise manner, only when necessary and feasible. This is materialized in an efficient way by leveraging the prefix tree constructed from the sampled sequences. Experimental results on paraphrase generation and constrained machine translation tasks showcase the effectiveness of our method.

Anthology ID:: 2023.emnlp-main.371
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6078–6092
Language:
URL:: https://aclanthology.org/2023.emnlp-main.371/
DOI:: 10.18653/v1/2023.emnlp-main.371
Bibkey:
Cite (ACL):: Youngwon Lee, Jinu Lee, and Seung-won Hwang. 2023. Learning to Rank Generation with Pairwise Partial Rewards. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6078–6092, Singapore. Association for Computational Linguistics.
Cite (Informal):: Learning to Rank Generation with Pairwise Partial Rewards (Lee et al., EMNLP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.emnlp-main.371.pdf
Video:: https://aclanthology.org/2023.emnlp-main.371.mp4

PDF Cite Search Video Fix data