SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment

Wenqiao Zhu; Ji Liu; Lulu Wang; Jun Wu; Yulun Zhang

doi:10.18653/v1/2025.findings-acl.639

SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment

Wenqiao Zhu, Ji Liu, Lulu Wang, Jun Wu, Yulun Zhang

Abstract

Direct Preference Optimization (DPO) is broadly utilized for aligning Large Language Models (LLMs) with human values because of its flexibility. Despite its effectiveness, it has been observed that the capability of DPO to generate human-preferred response is limited and the results of DPO are far from resilient. To address these limitations, in this paper we propose a novel Self-Guided Direct Preference Optimization algorithm, i.e., SGDPO, which incorporates a pilot term to steer the gradient flow during the optimization process, allowing for fine-grained control over the updates of chosen and rejected rewards. We provide a detailed theoretical analysis of our proposed method and elucidate its operational mechanism. Furthermore, we conduct comprehensive experiments on various models and benchmarks. The extensive experimental results demonstrate the consistency between the empirical results and our theoretical analysis and confirm the effectiveness of our proposed approach (up to 9.19% higher score).

Anthology ID:: 2025.findings-acl.639
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12366–12383
Language:
URL:: https://aclanthology.org/2025.findings-acl.639/
DOI:: 10.18653/v1/2025.findings-acl.639
Bibkey:
Cite (ACL):: Wenqiao Zhu, Ji Liu, Lulu Wang, Jun Wu, and Yulun Zhang. 2025. SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment. In Findings of the Association for Computational Linguistics: ACL 2025, pages 12366–12383, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment (Zhu et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.639.pdf

PDF Cite Search Fix data