Enhancing Reinforcement Learning with Dense Rewards from Language Model Critic

Enhancing Reinforcement Learning with Dense Rewards from Language Model Critic Meng Cao author Lei Shu author Lei Yu author Yun Zhu author Nevan Wichers author Yinxiao Liu author Lei Meng author 2024-11 text Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing Yaser Al-Onaizan editor Mohit Bansal editor Yun-Nung Chen editor Association for Computational Linguistics Miami, Florida, USA conference publication cao-etal-2024-enhancing 10.18653/v1/2024.emnlp-main.515 https://aclanthology.org/2024.emnlp-main.515/ 2024-11 9119 9138