Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations

Prerna Juneja, Lika Lomidze


Abstract
There are growing concerns about the risks posed by AI companion applications designed for emotional engagement. Existing safety evaluations often rely on self-reported user data or interviews, offering limited insights into real-time dynamics. We present the first end-to-end scalable framework for controlled simulation and safety evaluation of multi-turn interactions with AI companion applications. Our framework integrates four key components: persona construction with clinical and psychometric validation, persona-specific scenario generation, scenario-driven multi-turn simulation with a dialogue refinement module that preserves persona fidelity, and harm evaluation. We apply this framework to evaluate how Replika, a widely used AI companion app, responds to high-risk user groups. We construct 9 personas representing individuals with depression, anxiety, PTSD, eating disorders, and incel identity, and collect 1,674 dialogue pairs across 25 high-risk scenarios. We combine emotion modeling and LLM–assisted utterance-and harm-level classification to analyze these exchanges. Results show that Replika exhibits a narrow emotional range dominated by curiosity and care, while frequently mirroring or normalizing unsafe content such as self-harm, disordered eating, and violent-fantasy narratives. These findings highlight how controlled persona simulations can serve as a scalable testbed for evaluating safety risks in AI companions.
Anthology ID:
2026.acl-long.828
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18148–18175
Language:
URL:
https://aclanthology.org/2026.acl-long.828/
DOI:
10.18653/v1/2026.acl-long.828
Bibkey:
Cite (ACL):
Prerna Juneja and Lika Lomidze. 2026. Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 18148–18175, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations (Juneja & Lomidze, ACL 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.acl-long.828.pdf
Checklist:
 2026.acl-long.828.checklist.pdf