Sarfaroz Yunusov

2025

Personality Matters: User Traits Predict LLM Preferences in Multi-Turn Collaborative Tasks
Sarfaroz Yunusov | Kaige Chen | Kazi Nishat Anwar | Ali Emami
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

As Large Language Models (LLMs) increasingly integrate into everyday workflows, where users shape outcomes through multi-turn collaboration, a critical question emerges: do users with different personality traits systematically prefer certain LLMs over others? We conduc-ted a study with 32 participants evenly distributed across four Keirsey personality types, evaluating their interactions with GPT-4 and Claude 3.5 across four collaborative tasks: data analysis, creative writing, information retrieval, and writing assistance. Results revealed significant personality-driven preferences: *Rationals* strongly preferred GPT-4, particularly for goal-oriented tasks, while *idealists* favored Claude 3.5, especially for creative and analytical tasks. Other personality types showed task-dependent preferences. Sentiment analysis of qualitative feedback confirmed these patterns. Notably, aggregate helpfulness ratings were similar across models, showing how personality-based analysis reveals LLM differences that traditional evaluations miss.

2024

pdf bib abs

Subtle Biases Need Subtler Measures: Dual Metrics for Evaluating Representative and Affinity Bias in Large Language Models
Abhishek Kumar | Sarfaroz Yunusov | Ali Emami
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Research on Large Language Models (LLMs) has often neglected subtle biases that, although less apparent, can significantly influence the models’ outputs toward particular social narratives. This study addresses two such biases within LLMs: representative bias, which denotes a tendency of LLMs to generate outputs that mirror the experiences of certain identity groups, and affinity bias, reflecting the models’ evaluative preferences for specific narratives or viewpoints. We introduce two novel metrics to measure these biases: the Representative Bias Score (RBS) and the Affinity Bias Score (ABS), and present the Creativity-Oriented Generation Suite (CoGS), a collection of open-ended tasks such as short story writing and poetry composition, designed with customized rubrics to detect these subtle biases. Our analysis uncovers marked representative biases in prominent LLMs, with a preference for identities associated with being white, straight, and men. Furthermore, our investigation of affinity bias reveals distinctive evaluative patterns within each model, akin to ‘bias fingerprints’. This trend is also seen in human evaluators, highlighting a complex interplay between human and machine bias perceptions.

pdf bib abs

MirrorStories: Reflecting Diversity through Personalized Narrative Generation with Large Language Models
Sarfaroz Yunusov | Hamza Sidat | Ali Emami
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

This study explores the effectiveness of Large Language Models (LLMs) in creating personalized “mirror stories” that reflect and resonate with individual readers’ identities, addressing the significant lack of diversity in literature. We present MirrorStories, a corpus of 1,500 personalized short stories generated by integrating elements such as name, gender, age, ethnicity, reader interest, and story moral. We demonstrate that LLMs can effectively incorporate diverse identity elements into narratives, with human evaluators identifying personalized elements in the stories with high accuracy. Through a comprehensive evaluation involving 26 diverse human judges, we compare the effectiveness of MirrorStories against generic narratives. We find that personalized LLM-generated stories not only outscore generic human-written and LLM-generated ones across all metrics of engagement (with average ratings of 4.22 versus 3.37 on a 5-point scale), but also achieve higher textual diversity while preserving the intended moral. We also provide analyses that include bias assessments and a study on the potential for integrating images into personalized stories.

Co-authors

Venues

EMNLP2
ACL1

Fix author