Evaluating the Expressive Appropriateness of Speech in Rich Contexts

Tianrui Wang; Ziyang Ma; Yizhou Peng; Haoyu Wang; Zhikang Niu; Zikang Huang; Yihao Wu; Yi-Wen Chao; Yu Jiang; Yuheng Lu; Guanrou Yang; Xuanchen Li; Hexin Liu; Chunyu Qiang; Cheng Gong; Yifan Yang; Tianchi Liu; Junyu Wang; Nana Hou; Meng Ge; Fuming You; Yang Wei; Zhongqian Sun; Hu Haifeng; Xiaobao Wang; Eng Siong Chng; Xie Chen; Longbiao Wang; Jianwu Dang

Evaluating the Expressive Appropriateness of Speech in Rich Contexts

Tianrui Wang, Ziyang Ma, Yizhou Peng, Haoyu Wang, Zhikang Niu, Zikang Huang, Yihao Wu, Yi-Wen Chao, Yu Jiang, Yuheng Lu, Guanrou Yang, Xuanchen Li, Hexin Liu, Chunyu Qiang, Cheng Gong, Yifan Yang, Tianchi Liu, Junyu Wang, Nana Hou, Meng Ge, Fuming You, Yang Wei, Zhongqian Sun, Hu Haifeng, Xiaobao Wang, Eng Siong Chng, Xie Chen, Longbiao Wang, Jianwu Dang

Abstract

Evaluating expressive speech remains challenging, as existing methods mainly assess emotional intensity and overlook whether a speech sample is expressively appropriate for its contextual setting. This limitation hinders reliable evaluation of speech systems used in narrative-driven and interactive applications, such as audiobooks and conversational agents. We introduce CEAEval, a Context-rich framework for Evaluating Expressive Appropriateness in speech, which assesses whether a speech sample expressively aligns with the underlying communicative intent implied by its discourse-level narrative context. To support this task, we construct CEAEval-D, the first context-rich speech dataset with real human performances in Mandarin conversational speech, providing narrative descriptions together with fifteen dimensions of human annotations covering expressive attributes and expressive appropriateness. We further develop CEAEval-M, a model that integrates knowledge distillation, planner-based multi-model collaboration, adaptive audio attention bias, and reinforcement learning to perform context-rich expressive appropriateness evaluation. Experiments on a human-annotated test set demonstrate that CEAEval-M substantially outperforms existing speech evaluation and analysis systems.

Anthology ID:: 2026.acl-long.411
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9088–9106
Language:
URL:: https://aclanthology.org/2026.acl-long.411/
DOI:
Bibkey:
Cite (ACL):: Tianrui Wang, Ziyang Ma, Yizhou Peng, Haoyu Wang, Zhikang Niu, Zikang Huang, Yihao Wu, Yi-Wen Chao, Yu Jiang, Yuheng Lu, Guanrou Yang, Xuanchen Li, Hexin Liu, Chunyu Qiang, Cheng Gong, Yifan Yang, Tianchi Liu, Junyu Wang, Nana Hou, Meng Ge, Fuming You, Yang Wei, Zhongqian Sun, Hu Haifeng, Xiaobao Wang, Eng Siong Chng, Xie Chen, Longbiao Wang, and Jianwu Dang. 2026. Evaluating the Expressive Appropriateness of Speech in Rich Contexts. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9088–9106, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Evaluating the Expressive Appropriateness of Speech in Rich Contexts (Wang et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.411.pdf
Checklist:: 2026.acl-long.411.checklist.pdf

PDF Cite Search Checklist Fix data