TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback

TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback Eunseop Yoon author Hee Suk Yoon author SooHwan Eom author Gunsoo Han author Daniel Nam author Daejin Jo author Kyoung-Woon On author Mark Hasegawa-Johnson author Sungwoong Kim author Chang Yoo author 2024-08 text Findings of the Association for Computational Linguistics: ACL 2024 Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication yoon-etal-2024-tlcr 10.18653/v1/2024.findings-acl.889 https://aclanthology.org/2024.findings-acl.889/ 2024-08 14969 14981