VCB Bench: An Evaluation Benchmark for Audio-Grounded Large Language Model Conversational Agents

Jiliang Hu; Wenfu Wang; Zuchao Li; Chenxing Li; Yiyang Zhao; Hanzhao Li; Liqiang Zhang; Meng Yu; Dong Yu (于东)

VCB Bench: An Evaluation Benchmark for Audio-Grounded Large Language Model Conversational Agents

Jiliang Hu, Wenfu Wang, Zuchao Li, Chenxing Li, Yiyang Zhao, Hanzhao Li, Liqiang Zhang, Meng Yu, Dong Yu

Abstract

While large audio language models (LALMs) have driven significant progress in multimodal conversational systems, current benchmarks suffer from critical limitations: they are largely English-centric, use synthetic speech, and fail to provide comprehensive, discriminative evaluation across key dimensions. To fill this gap, we present Voice Chat Bot Bench (VCB Bench), a novel, high-quality Chinese benchmark built exclusively on real human speech. VCB Bench assesses LALMs across three complementary axes: instruction following (including speech-level control beyond text commands), knowledge understanding (including general knowledge, reasoning, and daily dialogue), and robustness (evaluating stability under variations in content, environment, and speaker characteristics). Experiments conducted on representative LALMs reveal notable performance disparities and offer tangible insights for future improvements. VCB Bench serves as a reproducible and fine-grained framework, providing standardized evaluation and practical guidance for the development of Chinese voice conversational models.

Anthology ID:: 2026.findings-acl.1659
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 33176–33200
Language:
URL:: https://aclanthology.org/2026.findings-acl.1659/
DOI:
Bibkey:
Cite (ACL):: Jiliang Hu, Wenfu Wang, Zuchao Li, Chenxing Li, Yiyang Zhao, Hanzhao Li, Liqiang Zhang, Meng Yu, and Dong Yu. 2026. VCB Bench: An Evaluation Benchmark for Audio-Grounded Large Language Model Conversational Agents. In Findings of the Association for Computational Linguistics: ACL 2026, pages 33176–33200, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: VCB Bench: An Evaluation Benchmark for Audio-Grounded Large Language Model Conversational Agents (Hu et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1659.pdf
Checklist:: 2026.findings-acl.1659.checklist.pdf

PDF Cite Search Checklist Fix data