Jinghua Zhao
2026
EmotionTalk: An Interactive Chinese Multimodal Emotion Dataset With Rich Annotations
Haoqin Sun | Jinghua Zhao | Xuechen Wang | Shiwan Zhao | Jiaming Zhou | Hui Wang | Xi Yang | Yequan Wang | Yonghua Lin
Findings of the Association for Computational Linguistics: ACL 2026
Haoqin Sun | Jinghua Zhao | Xuechen Wang | Shiwan Zhao | Jiaming Zhou | Hui Wang | Xi Yang | Yequan Wang | Yonghua Lin
Findings of the Association for Computational Linguistics: ACL 2026
The advancement of Multimodal Emotion Recognition (MER) in Chinese is significantly hindered by the scarcity of high-quality, spontaneous dialogue datasets compared to their English counterparts. In this work, we introduce EmotionTalk, the first interactive Chinese multimodal dataset designed to capture the nuance of authentic emotional interplay. Collected from 19 professional actors, the dataset spans 23.6 hours of dyadic conversations across diverse scenarios. A key contribution of EmotionTalk is its multi-grained annotation system, which integrates standard categorical and dimensional labels with fine-grained emotional speaking style captions, enabling research into interpretable emotion analysis. We establish comprehensive benchmarks for emotion recognition and captioning tasks, verifying the dataset’s effectiveness and the necessity of multimodal fusion. EmotionTalk serves as a critical resource for bridging the gap in non-English affective computing and is publicly released for the research community.
SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation
Hui Wang | Jinghua Zhao | Yifan Yang | Shujie Liu | Junyang Chen | Yanzhe Zhang | Shiwan Zhao | Jinyu Li | Jiaming Zhou | Haoqin Sun | Yan Lu | Yong Qin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Hui Wang | Jinghua Zhao | Yifan Yang | Shujie Liu | Junyang Chen | Yanzhe Zhang | Shiwan Zhao | Jinyu Li | Jiaming Zhou | Haoqin Sun | Yan Lu | Yong Qin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Generative speech technologies are progressing rapidly, but evaluating the perceptual quality of synthetic speech remains a core challenge. Existing methods typically rely on scalar scores or binary decisions, which lack interpretability and generalization across tasks and languages. We present SpeechLLM-as-Judges, a new paradigm for enabling large language models (LLMs) to conduct structured and explanation-based speech quality evaluation. To support this direction, we introduce SpeechEval, a large-scale dataset containing 32,207 multilingual speech clips and 128,754 annotations spanning four tasks: quality assessment, pairwise comparison, improvement suggestion, and deepfake detection. Based on this resource, we develop SQ-LLM, a speech-quality-aware LLM trained with chain-of-thought reasoning and reward optimization to improve capability. Experimental results show that SQ-LLM delivers strong performance across tasks and languages, revealing the potential of this paradigm for advancing speech quality evaluation. The relevant code, models, and data are publicly available at https://github.com/NKU-HLT/SpeechLLM-as-Judges.