Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations

Prerna Juneja; Lika Lomidze

doi:10.18653/v1/2026.acl-long.828

Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations

Abstract

There are growing concerns about the risks posed by AI companion applications designed for emotional engagement. Existing safety evaluations often rely on self-reported user data or interviews, offering limited insights into real-time dynamics. We present the first end-to-end scalable framework for controlled simulation and safety evaluation of multi-turn interactions with AI companion applications. Our framework integrates four key components: persona construction with clinical and psychometric validation, persona-specific scenario generation, scenario-driven multi-turn simulation with a dialogue refinement module that preserves persona fidelity, and harm evaluation. We apply this framework to evaluate how Replika, a widely used AI companion app, responds to high-risk user groups. We construct 9 personas representing individuals with depression, anxiety, PTSD, eating disorders, and incel identity, and collect 1,674 dialogue pairs across 25 high-risk scenarios. We combine emotion modeling and LLM–assisted utterance-and harm-level classification to analyze these exchanges. Results show that Replika exhibits a narrow emotional range dominated by curiosity and care, while frequently mirroring or normalizing unsafe content such as self-harm, disordered eating, and violent-fantasy narratives. These findings highlight how controlled persona simulations can serve as a scalable testbed for evaluating safety risks in AI companions.

Anthology ID:: 2026.acl-long.828
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 18148–18175
Language:
URL:: https://aclanthology.org/2026.acl-long.828/
DOI:: 10.18653/v1/2026.acl-long.828
Bibkey:
Cite (ACL):: Prerna Juneja and Lika Lomidze. 2026. Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 18148–18175, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations (Juneja & Lomidze, ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.828.pdf
Checklist:: 2026.acl-long.828.checklist.pdf

PDF Cite Search Checklist Fix data