Prototypical Reward Network for Data-Efficient Model Alignment

Jinghan Zhang, Xiting Wang, Yiqiao Jin, Changyu Chen, Xinhao Zhang, Kunpeng Liu


Abstract
The reward model for Reinforcement Learning from Human Feedback (RLHF) has proven effective in fine-tuning Large Language Models (LLMs). This paper explores enhancing RLHF with Prototypical Networks to improve reward models. We propose a framework utilizing Prototypical Networks to enhance reward models under limited human feedback, enabling more stable and reliable structural learning from fewer samples. This enhances the model’s adaptability and accuracy in interpreting human preferences. Our experiments demonstrate that this approach significantly improves the performance of reward models and LLMs in human feedback tasks, surpassing traditional methods, especially in data-limited scenarios.
Anthology ID:
2024.acl-long.748
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13871–13884
Language:
URL:
https://aclanthology.org/2024.acl-long.748
DOI:
Bibkey:
Cite (ACL):
Jinghan Zhang, Xiting Wang, Yiqiao Jin, Changyu Chen, Xinhao Zhang, and Kunpeng Liu. 2024. Prototypical Reward Network for Data-Efficient Model Alignment. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13871–13884, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Prototypical Reward Network for Data-Efficient Model Alignment (Zhang et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-long.748.pdf