ARM: Alignment with Residual Energy-Based Model

Bo Pang; Caiming Xiong; Yingbo Zhou

ARM: Alignment with Residual Energy-Based Model

Abstract

While large language models (LLMs) trained with large-scale unsupervised learning acquire a wide variety of world knowledge and skills, its behavior does not necessarily align with human preferences. RLHF methods achieve successes in aligning LLM responses with human preferences and improving the controllability of LLM behavior with human instruction. However, RLHF methods are considerably complicated to implement, computationally expensive to train, and notoriously tricky to tune. In this work, we propose Alignment with Residual Energy-Based Model (ARM), as a simple and flexible alternative to RLHF methods. Our method is driven by an observation that we can learn an aligned policy by minimizing a forward Kullback–Leibler (KL) divergence from a target policy (in the form of a residual energy-based model) to a parameteric policy (LLM), instead of a reverse KL as in RLHF methods. With samples from the energy-based target policy, we can leverage the power of DPO (or other offline methods) to learn an aligned policy efficiently. ARM is simple to implement and applicable in various data settings. Our extensive experiments demonstrate its strong performance across multiple datasets, compared to strong baselines like PPO, DPO.

Anthology ID:: 2024.naacl-long.455
Volume:: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Kevin Duh, Helena Gomez, Steven Bethard
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8225–8236
Language:
URL:: https://aclanthology.org/2024.naacl-long.455
DOI:
Bibkey:
Cite (ACL):: Bo Pang, Caiming Xiong, and Yingbo Zhou. 2024. ARM: Alignment with Residual Energy-Based Model. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 8225–8236, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: ARM: Alignment with Residual Energy-Based Model (Pang et al., NAACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.naacl-long.455.pdf

PDF Cite Search