Pardon? Evaluating Conversational Repair in Large Audio-Language Models

Shuanghong Huang; Jinlei Xu; Youchao Zhou; Yanghao Zhou (周杨浩); Xuan Zhao; Chong Feng (冯冲); Wenxuan Zhang

Pardon? Evaluating Conversational Repair in Large Audio-Language Models

Shuanghong Huang, Jinlei Xu, Youchao Zhou, Yanghao Zhou, Xuan Zhao, Chong Feng, Wenxuan Zhang

Abstract

Large Audio-Language Models (LALMs) have demonstrated strong performance in spoken question answering (QA), with existing evaluations primarily focusing on answer accuracy and robustness to acoustic perturbations. However, such evaluations implicitly assume that spoken inputs remain semantically answerable, an assumption that often fails in real-world interaction when essential information is missing. In this work, we introduce a repair-aware evaluation setting that explicitly distinguishes between answerable and unanswerable audio inputs. We define answerability as a property of the input itself and construct paired evaluation conditions using a semantic-acoustic masking protocol. Based on this setting, we propose the Evaluability Awareness and Repair (EAR) score, a non-compensatory metric that jointly evaluates task competence under answerable conditions and repair behavior under unanswerable conditions. Experiments on two spoken QA benchmarks across diverse LALMs reveal a consistent gap between answer accuracy and conversational reliability: while many models perform well when inputs are answerable, most fail to recognize semantic unanswerability and initiate appropriate conversational repair. These findings expose a limitation of prevailing accuracy-centric evaluation practices and motivate reliability assessments that treat unanswerable inputs as cues for repair and continued interaction. The core code and dataset are publicly available at https://github.com/sheunghung/EAR.

Anthology ID:: 2026.findings-acl.976
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 19528–19541
Language:
URL:: https://aclanthology.org/2026.findings-acl.976/
DOI:
Bibkey:
Cite (ACL):: Shuanghong Huang, Jinlei Xu, Youchao Zhou, Yanghao Zhou, Xuan Zhao, Chong Feng, and Wenxuan Zhang. 2026. Pardon? Evaluating Conversational Repair in Large Audio-Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 19528–19541, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Pardon? Evaluating Conversational Repair in Large Audio-Language Models (Huang et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.976.pdf
Checklist:: 2026.findings-acl.976.checklist.pdf

PDF Cite Search Checklist Fix data