K-order Ranking Preference Optimization for Large Language Models

Shihao Cai; Chongming Gao; Yang Zhang; Wentao Shi; Jizhi Zhang; Keqin Bao; Qifan Wang; Fuli Feng

doi:10.18653/v1/2025.findings-acl.250

K-order Ranking Preference Optimization for Large Language Models

Shihao Cai, Chongming Gao, Yang Zhang, Wentao Shi, Jizhi Zhang, Keqin Bao, Qifan Wang, Fuli Feng

Abstract

To adapt large language models (LLMs) to ranking tasks, existing list-wise methods, represented by list-wise Direct Preference Optimization (DPO), focus on optimizing partial-order or full-order list ranking consistency for LLMs to enhance their ranking abilities.However, we argue that optimizing top-K ranking consistency could be more appropriate for real-world applications. There are two main reasons: (1) users are typically concerned with only the top-K results, making top-K ranking more important, and (2) tail items often lack precise feedback, making top-K ranking more reliable. Based on this, we propose K-order Ranking Preference Optimization (KPO) by extending the DPO’s Plackett-Luce model to accommodate top-K rankings. Additionally, recognizing that the number of important items can vary across queries, we extend KPO to dynamically determine appropriate K for different samples and introduce a curriculum learning strategy to boost training efficiency. Extensive experiments demonstrate the effectiveness of KPO, highlighting its high sample efficiency and robustness to noise. The code is available at https://github.com/Lanyu0303/KPO.

Anthology ID:: 2025.findings-acl.250
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4844–4859
Language:
URL:: https://aclanthology.org/2025.findings-acl.250/
DOI:: 10.18653/v1/2025.findings-acl.250
Bibkey:
Cite (ACL):: Shihao Cai, Chongming Gao, Yang Zhang, Wentao Shi, Jizhi Zhang, Keqin Bao, Qifan Wang, and Fuli Feng. 2025. K-order Ranking Preference Optimization for Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2025, pages 4844–4859, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: K-order Ranking Preference Optimization for Large Language Models (Cai et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.250.pdf

PDF Cite Search Fix data