Deep Reinforcement Learning of LLMs​ using RLHF

Enoch Levandovsky


Abstract
My main research interests lies in the application of Reinforcement Learning (RL) alignment of LLMs in human robot dialogue. More specifically, my latest research aims to use RL alignment as an efficient training regime to train a newly initialized tiny LM to behave like a toddler. Previous research expresses the difficulty of building a robust tiny LM with an educated adult level understanding. Our hypothesis is that the cognitive barrier to train a tiny LM to at-least behave as a child is achievable with a very small number of parameters especially if training efficiently using RL LLM training regime. My interests also extend to apply RL to LLM training for dialogue management and planning.
Anthology ID:
2025.yrrsds-1.2
Volume:
Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems
Month:
August
Year:
2025
Address:
Avignon, France
Editors:
Ryan Whetten, Virgile Sucal, Anh Ngo, Kranti Chalamalasetti, Koji Inoue, Gaetano Cimino, Zachary Yang, Yuki Zenimoto, Ricardo Rodriguez
Venue:
YRRSDS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4–5
Language:
URL:
https://aclanthology.org/2025.yrrsds-1.2/
DOI:
Bibkey:
Cite (ACL):
Enoch Levandovsky. 2025. Deep Reinforcement Learning of LLMs​ using RLHF. In Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems, pages 4–5, Avignon, France. Association for Computational Linguistics.
Cite (Informal):
Deep Reinforcement Learning of LLMs​ using RLHF (Levandovsky, YRRSDS 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.yrrsds-1.2.pdf