Too Good to be Bad: On the Failure of LLMs to Role-Play Villains

Zihao Yi; Qingxuan Jiang; Ruotian Ma; Xingyu Chen; Qu Yang; Mengru Wang; Fanghua Ye; Ying Shen; Zhaopeng Tu; Xiaolong Li; Liefeng Bo

Too Good to be Bad: On the Failure of LLMs to Role-Play Villains

Zihao Yi, Qingxuan Jiang, Ruotian Ma, Xingyu Chen, Qu Yang, Mengru Wang, Fanghua Ye, Ying Shen, Zhaopeng Tu, Xiaolong Li, Liefeng Bo

Abstract

Large Language Models (LLMs) are increasingly tasked with creative generation, including the simulation of fictional characters. However, their ability to portray non-prosocial, antagonistic personas remains largely unexamined. We hypothesize that the safety alignment of modern LLMs creates a fundamental conflict with the task of authentically role-playing morally ambiguous or villainous characters. To investigate this, we introduce the Moral RolePlay benchmark, a new dataset featuring a four-level moral alignment scale and a balanced test set for rigorous evaluation. We task state-of-the-art LLMs with role-playing characters from moral paragons to pure villains. Our large-scale evaluation reveals a consistent, monotonic decline in role-playing fidelity as character morality decreases. We find that models struggle most with traits directly antithetical to safety principles, such as ”Deceitful” and ”Manipulative”, often substituting nuanced malevolence with superficial aggression. Furthermore, we demonstrate that general chatbot proficiency is a poor predictor of villain role-playing ability, with highly safety-aligned models performing particularly poorly. Our work provides the first systematic evidence of this critical limitation, highlighting a key tension between model safety and creative fidelity. Our benchmark and findings pave the way for developing more nuanced, context-aware alignment methods.

Anthology ID:: 2026.findings-acl.282
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5724–5734
Language:
URL:: https://aclanthology.org/2026.findings-acl.282/
DOI:
Bibkey:
Cite (ACL):: Zihao Yi, Qingxuan Jiang, Ruotian Ma, Xingyu Chen, Qu Yang, Mengru Wang, Fanghua Ye, Ying Shen, Zhaopeng Tu, Xiaolong Li, and Liefeng Bo. 2026. Too Good to be Bad: On the Failure of LLMs to Role-Play Villains. In Findings of the Association for Computational Linguistics: ACL 2026, pages 5724–5734, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Too Good to be Bad: On the Failure of LLMs to Role-Play Villains (Yi et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.282.pdf
Checklist:: 2026.findings-acl.282.checklist.pdf

PDF Cite Search Checklist Fix data