AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models

Qi Liu; Jingqing Ruan; Hao Li; Haodong Zhao; Desheng Wang; Jiansong Chen; Wan Guanglu; Xunliang Cai; Zhi Zheng; Tong Xu

doi:10.18653/v1/2025.findings-acl.462

AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models

Qi Liu, Jingqing Ruan, Hao Li, Haodong Zhao, Desheng Wang, Jiansong Chen, Wan Guanglu, Xunliang Cai, Zhi Zheng, Tong Xu

Abstract

Existing multi-objective preference alignment methods for large language models (LLMs) face limitations: (1) the inability to effectively balance various preference dimensions, and (2) reliance on auxiliary reward/reference models introduces computational complexity. To address these challenges, we propose Adaptive Multi-objective Preference Optimization (AMoPO), a novel framework that achieves dynamic balance across preference dimensions. By introducing the multi-objective optimization paradigm to use the dimension-aware generation metrics as implicit rewards, AMoPO aligns LLMs with diverse preferences without additional reward models or reference models. We introduce an adaptive weight assignment mechanism that models the generation space as a Gaussian distribution, allowing dynamic prioritization of preference dimensions. Empirical results demonstrate that AMoPO outperforms state-of-the-art baselines by 28.5%, and the experiments on 7B, 14B, and 32B models reveal the scaling ability of AMoPO. Moreover, additional analysis of multiple dimensions verifies its adaptability and effectiveness. These findings validate AMoPO’s capability to achieve dimension-aware preference alignment, highlighting its superiority. Our codes and datasets are available at https://github.com/Javkonline/AMoPO.

Anthology ID:: 2025.findings-acl.462
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8832–8866
Language:
URL:: https://aclanthology.org/2025.findings-acl.462/
DOI:: 10.18653/v1/2025.findings-acl.462
Bibkey:
Cite (ACL):: Qi Liu, Jingqing Ruan, Hao Li, Haodong Zhao, Desheng Wang, Jiansong Chen, Wan Guanglu, Xunliang Cai, Zhi Zheng, and Tong Xu. 2025. AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models. In Findings of the Association for Computational Linguistics: ACL 2025, pages 8832–8866, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models (Liu et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.462.pdf

PDF Cite Search Fix data