Direct Large Language Model Alignment Through Self-Rewarding Contrastive Prompt Distillation

Direct Large Language Model Alignment Through Self-Rewarding Contrastive Prompt Distillation Aiwei Liu author Haoping Bai author Zhiyun Lu author Xiang Kong author Xiaoming Wang author Jiulong Shan author Meng Cao author Lijie Wen author 2024-08 text Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication liu-etal-2024-direct 10.18653/v1/2024.acl-long.523 https://aclanthology.org/2024.acl-long.523/ 2024-08 9688 9712