Rithesh R N


2024

pdf bib
PRACT: Optimizing Principled Reasoning and Acting of LLM Agent
Zhiwei Liu | Weiran Yao | Jianguo Zhang | Zuxin Liu | Liangwei Yang | Rithesh R N | Tian Lan | Ming Zhu | Juntao Tan | Shirley Kokane | Thai Quoc Hoang | Juan Carlos Niebles | Shelby Heinecke | Huan Wang | Silvio Savarese | Caiming Xiong
Proceedings of the 28th Conference on Computational Natural Language Learning

We introduce the Principled Reasoning and Acting (PRAct) framework, a novel method for learning and enforcing action principles from trajectory data. Central to our approach is the use of text gradients from a reflection and optimization engine to derive these action principles. To adapt action principles to specific task requirements, we propose a new optimization framework, Reflective Principle Optimization (RPO). After execution, RPO employs a reflector to critique current action principles and an optimizer to update them accordingly.We investigate the RPO framework under two scenarios: Reward-RPO, which uses environmental rewards for reflection, and Self-RPO, which conducts self-reflection without external rewards. Additionally, we developed two RPO methods, RPO-Traj and RPO-Batch, to adapt to different settings.Experimental results across four environments demonstrate that the PRAct agent, leveraging the RPO framework, can effectively learn and apply action principles to enhance performance.