Token-level Proximal Policy Optimization for Query Generation

Yichen Ouyang; Lu Wang; Fangkai Yang; Pu Zhao; Chenghua Huang; Jianfeng Liu; Bochen Pang; Yaming Yang; Yuefeng Zhan; Hao Sun; Qingwei Lin; Saravan Rajmohan; Weiwei Deng; Dongmei Zhang; Feng Sun

doi:10.18653/v1/2025.emnlp-main.1589

Token-level Proximal Policy Optimization for Query Generation

Yichen Ouyang, Lu Wang, Fangkai Yang, Pu Zhao, Chenghua Huang, Jianfeng Liu, Bochen Pang, Yaming Yang, Yuefeng Zhan, Hao Sun, Qingwei Lin, Saravan Rajmohan, Weiwei Deng, Dongmei Zhang, Feng Sun

Abstract

Query generation is a critical task for web search engines (e.g. Google, Bing) and recommendation systems. Recently, state-of-the-art query generation methods leverage Large Language Models (LLMs) for their strong capabilities in context understanding and text generation. However, they still face challenges in generating high-quality queries in terms of inferring user intent based on their web search interaction history. In this paper, we propose Token-level Proximal Policy Optimization (TPPO), a noval approach designed to empower LLMs perform better in query generation through fine-tuning. TPPO is based on the Reinforcement Learning from AI Feedback (RLAIF) paradigm, consisting of a token-level reward model and a token-level proximal policy optimization module to address the sparse reward challenge in traditional RLAIF frameworks. We conducted experiments on both open-source dataset and an industrial dataset that was collected from a globally-used search engine, demonstrating that TPPO significantly improves the performance of query generation for LLMs and outperforms its existing competitors.

Anthology ID:: 2025.emnlp-main.1589
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 31196–31210
Language:
URL:: https://aclanthology.org/2025.emnlp-main.1589/
DOI:: 10.18653/v1/2025.emnlp-main.1589
Bibkey:
Cite (ACL):: Yichen Ouyang, Lu Wang, Fangkai Yang, Pu Zhao, Chenghua Huang, Jianfeng Liu, Bochen Pang, Yaming Yang, Yuefeng Zhan, Hao Sun, Qingwei Lin, Saravan Rajmohan, Weiwei Deng, Dongmei Zhang, and Feng Sun. 2025. Token-level Proximal Policy Optimization for Query Generation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 31196–31210, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Token-level Proximal Policy Optimization for Query Generation (Ouyang et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.1589.pdf
Checklist:: 2025.emnlp-main.1589.checklist.pdf

PDF Cite Search Checklist Fix data