Shreyas Chaudhari


2025

pdf bib
PersonaGym: Evaluating Persona Agents and LLMs
Vinay Samuel | Henry Peng Zou | Yue Zhou | Shreyas Chaudhari | Ashwin Kalyan | Tanmay Rajpurohit | Ameet Deshpande | Karthik R Narasimhan | Vishvak Murahari
Findings of the Association for Computational Linguistics: EMNLP 2025

Persona agents, which are LLM agents conditioned to act according to an assigned persona, enable contextually rich and user-aligned interactions across domains like education and healthcare.However, evaluating how faithfully these agents adhere to their personas remains a significant challenge, particularly in free-form settings that demand consistency across diverse, persona-relevant environments.We introduce PersonaGym, the first dynamic evaluation framework for persona agents, and PersonaScore, a human-aligned automatic metric grounded in decision theory that enables comprehensive large-scale evaluation. Our evaluation of 10 leading LLMs across 200 personas and 10,000 questions reveals significant advancement opportunities.For example, GPT-4.1 had the exact same PersonaScore as LLaMA-3-8b despite being a more recent and advanced closed-source model. Importantly, increased model size and complexity do not necessarily enhance persona agent capabilities, underscoring the need for algorithmic and architectural innovation toward faithful, performant persona agents.

2024

pdf bib
RAGs to Style: Personalizing LLMs with Style Embeddings
Abhiman Neelakanteswara | Shreyas Chaudhari | Hamed Zamani
Proceedings of the 1st Workshop on Personalization of Generative AI Systems (PERSONALIZE 2024)

This paper studies the use of style embeddings to enhance author profiling for the goal of personalization of Large Language Models (LLMs). Using a style-based Retrieval-Augmented Generation (RAG) approach, we meticulously study the efficacy of style embeddings in capturing distinctive authorial nuances. The proposed method leverages this acquired knowledge to enhance the personalization capabilities of LLMs. In the assessment of this approach, we have employed the LaMP benchmark, specifically tailored for evaluating language models across diverse dimensions of personalization. The empirical observations from our investigation reveal that, in comparison to term matching or context matching, style proves to be marginally superior in the development of personalized LLMs.