WPO: Enhancing RLHF with Weighted Preference Optimization Wenxuan Zhou author Ravi Agrawal author Shujian Zhang author Sathish Reddy Indurthi author Sanqiang Zhao author Kaiqiang Song author Silei Xu author Chenguang Zhu author 2024-11 text Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing Yaser Al-Onaizan editor Mohit Bansal editor Yun-Nung Chen editor Association for Computational Linguistics Miami, Florida, USA conference publication zhou-etal-2024-wpo 10.18653/v1/2024.emnlp-main.475 https://aclanthology.org/2024.emnlp-main.475/ 2024-11 8328 8340