On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization

On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization Yong Lin author Skyler Seto author Maartje Ter Hoeve author Katherine Metcalf author Barry-John Theobald author Xuan Wang author Yizhe Zhang author Chen Huang author Tong Zhang author 2024-11 text Findings of the Association for Computational Linguistics: EMNLP 2024 Yaser Al-Onaizan editor Mohit Bansal editor Yun-Nung Chen editor Association for Computational Linguistics Miami, Florida, USA conference publication lin-etal-2024-limited 10.18653/v1/2024.findings-emnlp.940 https://aclanthology.org/2024.findings-emnlp.940/ 2024-11 16015 16026