Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models

Somanshu Singla, Zhen Wang, Tianyang Liu, Abdullah Ashfaq, Zhiting Hu, Eric P. Xing


Abstract
Aligning Large Language Models (LLMs) traditionally relies on complex and costly training processes like supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). To address the challenge of achieving alignment without these extensive tuning costs and expensive annotations, we present a novel, tuning-free approach for self-alignment called Dynamic Rewarding with Prompt Optimization (DRPO). Our approach enables self-alignment through a search-based prompt optimization framework, allowing the model to self-improve and generate optimized prompts without additional training or human supervision. The core of DRPO leverages a dynamic rewarding mechanism to identify and rectify model-specific alignment weaknesses, enabling LLMs to adapt quickly to various alignment challenges. Empirical evaluations on eight recent LLMs, including both open- and closed-source, reveal that DRPO significantly enhances alignment performance, enabling base models to outperform their SFT/RLHF-tuned counterparts. Moreover, DRPO’s automatically optimized prompts surpass those curated by human experts, demonstrating its superior alignment capabilities. Our findings envision a highly cost-effective and adaptable solution for future alignment research to be further explored.
Anthology ID:
2024.emnlp-main.1220
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
21889–21909
Language:
URL:
https://aclanthology.org/2024.emnlp-main.1220/
DOI:
10.18653/v1/2024.emnlp-main.1220
Bibkey:
Cite (ACL):
Somanshu Singla, Zhen Wang, Tianyang Liu, Abdullah Ashfaq, Zhiting Hu, and Eric P. Xing. 2024. Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 21889–21909, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models (Singla et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.1220.pdf