Offline Reinforcement Learning from Human Feedback in Real-World Sequence-to-Sequence Tasks

Offline Reinforcement Learning from Human Feedback in Real-World Sequence-to-Sequence Tasks Julia Kreutzer author Stefan Riezler author Carolin Lawrence author 2021-08 text Proceedings of the 5th Workshop on Structured Prediction for NLP (SPNLP 2021) Zornitsa Kozareva editor Sujith Ravi editor Andreas Vlachos editor Priyanka Agrawal editor André Martins editor Association for Computational Linguistics Online conference publication kreutzer-etal-2021-offline 10.18653/v1/2021.spnlp-1.4 https://aclanthology.org/2021.spnlp-1.4/ 2021-08 37 43