Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game

Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game Pengyu Cheng author Yifan Yang author Jian Li author Yong Dai author Tianhao Hu author Peixin Cao author Nan Du author Xiaolong Li author 2024-08 text Findings of the Association for Computational Linguistics: ACL 2024 Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication cheng-etal-2024-adversarial 10.18653/v1/2024.findings-acl.221 https://aclanthology.org/2024.findings-acl.221/ 2024-08 3705 3716