ESC-Judge: A Framework for Comparing Emotional Support Conversational Agents

Navid Madani; Rohini K. Srihari

doi:10.18653/v1/2025.emnlp-main.811

ESC-Judge: A Framework for Comparing Emotional Support Conversational Agents

Abstract

Large Language Models (LLMs) increasingly power mental-health chatbots, yet the field still lacks a scalable, theory-grounded way to decide which model is more effective to deploy. We present ESC-Judge, the first end-to-end evaluation framework that (i) grounds head-to-head comparison of Emotional-Support LLMs (ES-LLMs) in an established psychological theory—Clara Hill’s Exploration–Insight–Action (E-I-A) counselling model—thereby delivering a structured, interpretable lens on performance, and (ii) fully automates the pipeline at scale. ESC-Judge proceeds in three stages: (1) it synthesizes realistic help-seeker roles by sampling empirically salient attributes (stressors, personality, life history); (2) it has two candidate ES-Agents conduct separate sessions with the same role, isolating model-specific strategies; and (3) it asks a specialised judge LLM to issue pairwise preferences across rubric-anchored skills that exhaustively cover the E-I-A spectrum. In our empirical study, ESC-Judge matches PhD-level annotators in 85% of Exploration, 83% of Insight, and 86% of Action decisions, demonstrating human-level reliability at a fraction of the cost. We release all code, prompts, synthetic roles, transcripts, and judgment scripts to catalyze transparent progress in emotionally supportive AI

Anthology ID:: 2025.emnlp-main.811
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 16048–16065
Language:
URL:: https://aclanthology.org/2025.emnlp-main.811/
DOI:: 10.18653/v1/2025.emnlp-main.811
Bibkey:
Cite (ACL):: Navid Madani and Rohini Srihari. 2025. ESC-Judge: A Framework for Comparing Emotional Support Conversational Agents. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 16048–16065, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: ESC-Judge: A Framework for Comparing Emotional Support Conversational Agents (Madani & Srihari, EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.811.pdf
Checklist:: 2025.emnlp-main.811.checklist.pdf

PDF Cite Search Checklist Fix data