RL with KL penalties is better viewed as Bayesian inference

RL with KL penalties is better viewed as Bayesian inference Tomasz Korbak author Ethan Perez author Christopher Buckley author 2022-12 text Findings of the Association for Computational Linguistics: EMNLP 2022 Yoav Goldberg editor Zornitsa Kozareva editor Yue Zhang editor Association for Computational Linguistics Abu Dhabi, United Arab Emirates conference publication korbak-etal-2022-rl 10.18653/v1/2022.findings-emnlp.77 https://aclanthology.org/2022.findings-emnlp.77/ 2022-12 1083 1091