Proceedings of the 1st Workshop on Personalization of Generative AI Systems (PERSONALIZE 2024)

Ameet Deshpande, EunJeong Hwang, Vishvak Murahari, Joon Sung Park, Diyi Yang, Ashish Sabharwal, Karthik Narasimhan, Ashwin Kalyan (Editors)

Anthology ID:: 2024.personalize-1
Month:: March
Year:: 2024
Address:: St. Julians, Malta
Venues:: PERSONALIZE | WS
SIG:
Publisher:: Association for Computational Linguistics
URL:: https://aclanthology.org/2024.personalize-1
DOI:
Bib Export formats:: BibTeX MODS XML EndNote
PDF:: https://aclanthology.org/2024.personalize-1.pdf

PDF (full) BibTeX Search

pdf bib abs
RoleCraft-GLM: Advancing Personalized Role-Playing in Large Language Models
Meiling Tao | Liang Xuechen | Tianyu Shi | Lei Yu | Yiting Xie

This study presents RoleCraft-GLM, an innovative framework aimed at enhancing personalized role-playing with Large Language Models (LLMs). RoleCraft-GLM addresses the key issue of lacking personalized interactions in conversational AI, and offers a solution with detailed and emotionally nuanced character portrayals. We contribute a unique conversational dataset that shifts from conventional celebrity-centric characters to diverse, non-celebrity personas, thus enhancing the realism and complexity of language modeling interactions. Additionally, our approach includes meticulous character development, ensuring dialogues are both realistic and emotionally resonant. The effectiveness of RoleCraft-GLM is validated through various case studies, highlighting its versatility and skill in different scenarios. Our framework excels in generating dialogues that accurately reflect characters’ personality traits and emotions, thereby boosting user engagement. In conclusion, RoleCraft-GLM marks a significant leap in personalized AI interactions, and paves the way for more authentic and immersive AI-assisted role-playing experiences by enabling more nuanced and emotionally rich dialogues.

pdf bib abs
How to use Language Models for Synthetic Text Generation in Cerebrovascular Disease-specific Medical Reports
Byoung-Doo Oh | Gi-Youn Kim | Chulho Kim | Yu-Seop Kim

The quantity and quality of data have a significant impact on the performance of artificial intelligence (AI). However, in the biomedical domain, data often contains sensitive information such as personal details, making it challenging to secure enough data for medical AI. Consequently, there is a growing interest in synthetic data generation for medical AI. However, research has primarily focused on medical images, with little given to text-based data such as medical records. Therefore, this study explores the application of language models (LMs) for synthetic text generation in low-resource domains like medical records. It compares the results of synthetic text generation based on different LMs. To achieve this, we focused on two criteria for LM-based synthetic text generation of medical records using two keywords entered by the user: 1) the impact of the LM’s knowledge, 2) the impact of the LM’s size. Additionally, we objectively evaluated the generated synthetic text, including representative metrics such as BLUE and ROUGE, along with clinician’s evaluations.

pdf bib abs
Assessing Generalization for Subpopulation Representative Modeling via In-Context Learning
Gabriel Simmons | Vladislav Savinov

This study evaluates the ability of Large Language Model (LLM)-based Subpopulation Representative Models (SRMs) to generalize from empirical data, utilizing in-context learning with data from the 2016 and 2020 American National Election Studies. We explore generalization across response variables and demographic subgroups. While conditioning with empirical data improves performance on the whole, the benefit of in-context learning varies considerably across demographics, sometimes hurting performance for one demographic while helping performance for others. The inequitable benefits of in-context learning for SRM present a challenge for practitioners implementing SRMs, and for decision-makers who might come to rely on them. Our work highlights a need for fine-grained benchmarks captured from diverse subpopulations that test not only fidelity but generalization.

pdf bib abs
HumSum: A Personalized Lecture Summarization Tool for Humanities Students Using LLMs
Zahra Kolagar | Alessandra Zarcone

Generative AI systems aim to create customizable content for their users, with a subsequent surge in demand for adaptable tools that can create personalized experiences. This paper presents HumSum, a web-based tool tailored for humanities students to effectively summarize their lecture transcripts and to personalize the summaries to their specific needs. We first conducted a survey driven by different potential scenarios to collect user preferences to guide the implementation of this tool. Utilizing Streamlit, we crafted the user interface, while Langchain’s Map Reduce function facilitated the summarization process for extensive lectures using OpenAI’s GPT-4 model. HumSum is an intuitive tool serving various summarization needs, infusing personalization into the tool’s functionality without necessitating the collection of personal user data.

pdf bib abs
Can I trust You? LLMs as conversational agents
Marc Döbler | Raghavendran Mahendravarman | Anna Moskvina | Nasrin Saef

With the rising popularity of LLMs in the public sphere, they become more and more attractive as a tool for doing one’s own research without having to rely on search engines or specialized knowledge of a scientific field. But using LLMs as a source for factual information can lead one to fall prey to misinformation or hallucinations dreamed up by the model. In this paper we examine the gpt-4 LLM by simulating a large number of potential research queries and evaluate how many of the generated references are factually correct as well as existent.

pdf bib abs
Emulating Author Style: A Feasibility Study of Prompt-enabled Text Stylization with Off-the-Shelf LLMs
Avanti Bhandarkar | Ronald Wilson | Anushka Swarup | Damon Woodard

User-centric personalization of text opens many avenues of applications from stylized email composition to machine translation. Existing approaches in this domain often encounter limitations in data and resource requirements. Drawing inspiration from the success of resource-efficient prompt-enabled stylization in related fields, this work conducts the first feasibility into testing 12 pre-trained SOTA LLMs for author style emulation. Although promising, the results suggest that current off-the-shelf LLMs fall short of achieving effective author style emulation. This work provides valuable insights through which off-the-shelf LLMs could be potentially utilized for user-centric personalization easily and at scale.

pdf bib abs
LLMs Simulate Big5 Personality Traits: Further Evidence
Aleksandra Sorokovikova | Sharwin Rezagholi | Natalia Fedorova | Ivan P. Yamshchikov

An empirical investigation into the simulation of the Big5 personality traits by large language models (LLMs), namely Llama-2, GPT-4, and Mixtral, is presented. We analyze the personality traits simulated by these models and their stability. This contributes to the broader understanding of the capabilities of LLMs to simulate personality traits and the respective implications for personalized human-computer interaction.

pdf bib abs
Personalized Text Generation with Fine-Grained Linguistic Control
Bashar Alhafni | Vivek Kulkarni | Dhruv Kumar | Vipul Raheja

As the text generation capabilities of large language models become increasingly prominent, recent studies have focused on controlling particular aspects of the generated text to make it more personalized. However, most research on controllable text generation focuses on controlling the content or modeling specific high-level/coarse-grained attributes that reflect authors’ writing styles, such as formality, domain, or sentiment. In this paper, we focus on controlling fine-grained attributes spanning multiple linguistic dimensions, such as lexical and syntactic attributes. We introduce a novel benchmark to train generative models and evaluate their ability to generate personalized text based on multiple fine-grained linguistic attributes. We systematically investigate the performance of various large language models on our benchmark and draw insights from the factors that impact their performance. We make our code, data, models, and benchmarks publicly available.

pdf bib abs
LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models
Ivar Frisch | Mario Giulianelli

Agent interaction has long been a key topic in psychology, philosophy, and artificial intelligence, and it is now gaining traction in large language model (LLM) research. This experimental study seeks to lay the groundwork for our understanding of dialogue-based interaction between LLMs: Do persona-prompted LLMs show consistent personality and language use in interaction? We condition GPT-3.5 on asymmetric personality profiles to create a population of LLM agents, administer personality tests and submit the agents to a collaborative writing task. We find different profiles exhibit different degrees of personality consistency and linguistic alignment in interaction.

pdf bib abs
Quantifying learning-style adaptation in effectiveness of LLM teaching
Ruben Weijers | Gabrielle Fidelis de Castilho | Jean-François Godbout | Reihaneh Rabbany | Kellin Pelrine

This preliminary study aims to investigate whether AI, when prompted based on individual learning styles, can effectively improve comprehension and learning experiences in educational settings. It involves tailoring LLMs baseline prompts and comparing the results of a control group receiving standard content and an experimental group receiving learning style-tailored content. Preliminary results suggest that GPT-4 can generate responses aligned with various learning styles, indicating the potential for enhanced engagement and comprehension. However, these results also reveal challenges, including the model’s tendency for sycophantic behavior and variability in responses. Our findings suggest that a more sophisticated prompt engineering approach is required for integrating AI into education (AIEd) to improve educational outcomes.

pdf bib abs
RAGs to Style: Personalizing LLMs with Style Embeddings
Abhiman Neelakanteswara | Shreyas Chaudhari | Hamed Zamani

This paper studies the use of style embeddings to enhance author profiling for the goal of personalization of Large Language Models (LLMs). Using a style-based Retrieval-Augmented Generation (RAG) approach, we meticulously study the efficacy of style embeddings in capturing distinctive authorial nuances. The proposed method leverages this acquired knowledge to enhance the personalization capabilities of LLMs. In the assessment of this approach, we have employed the LaMP benchmark, specifically tailored for evaluating language models across diverse dimensions of personalization. The empirical observations from our investigation reveal that, in comparison to term matching or context matching, style proves to be marginally superior in the development of personalized LLMs.

pdf bib abs
User Embedding Model for Personalized Language Prompting
Sumanth Doddapaneni | Krishna Sayana | Ambarish Jash | Sukhdeep Sodhi | Dima Kuzmin

Modeling long user histories plays a pivotal role in enhancing recommendation systems, allowing to capture users’ evolving preferences, resulting in more precise and personalized recommendations. In this study, we tackle the challenges of modeling long user histories for preference understanding in natural language. Specifically, we introduce a new User Embedding Module (UEM) that efficiently processes user history in free-form text by compressing and representing them as embeddings, to use them as soft prompts to a language model (LM). Our experiments demonstrate the superior capability of this approach in handling significantly longer histories compared to conventional text-based methods, yielding substantial improvements in predictive performance. Models trained using our approach exhibit substantial enhancements, with up to 0.21 and 0.25 F1 points improvement over the text-based prompting baselines. The main contribution of this research is to demonstrate the ability to bias language models via user signals.