Optimizing Language Models with Fair and Stable Reward Composition in Reinforcement Learning

Optimizing Language Models with Fair and Stable Reward Composition in Reinforcement Learning Jiahui Li author Hanlin Zhang author Fengda Zhang author Tai-Wei Chang author Kun Kuang author Long Chen author Jun Zhou author 2024-11 text Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing Yaser Al-Onaizan editor Mohit Bansal editor Yun-Nung Chen editor Association for Computational Linguistics Miami, Florida, USA conference publication li-etal-2024-optimizing-language 10.18653/v1/2024.emnlp-main.565 https://aclanthology.org/2024.emnlp-main.565/ 2024-11 10122 10140