Towards Reliable Large Audio Language Model

Ziyang Ma; Xiquan Li; Yakun Song; Wenxi Chen; Chenpeng Du; Jian Wu; Yuanzhe Chen; Zhuo Chen; Yuping Wang; Yuxuan Wang; Xie Chen

doi:10.18653/v1/2025.findings-acl.56

Towards Reliable Large Audio Language Model

Ziyang Ma, Xiquan Li, Yakun Song, Wenxi Chen, Chenpeng Du, Jian Wu, Yuanzhe Chen, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xie Chen

Abstract

Recent advancements in large audio language models (LALMs) have demonstrated impressive results and promising prospects in universal understanding and reasoning across speech, music, and general sound. However, these models still lack the ability to recognize their knowledge boundaries and refuse to answer questions they don’t know proactively. While there have been successful attempts to enhance the reliability of LLMs, reliable LALMs remain largely unexplored. In this paper, we systematically investigate various approaches towards reliable LALMs, including training-free methods such as multi-modal chain-of-thought (MCoT), and training-based methods such as supervised fine-tuning (SFT). Besides, we identify the limitations of previous evaluation metrics and propose a new metric, the Reliability Gain Index (RGI), to assess the effectiveness of different reliable methods. Our findings suggest that both training-free and training-based methods enhance the reliability of LALMs to different extents. Moreover, we find that awareness of reliability is a “meta ability”, which can be transferred across different audio modalities, although significant structural and content differences exist among sound, music, and speech.

Anthology ID:: 2025.findings-acl.56
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1000–1014
Language:
URL:: https://aclanthology.org/2025.findings-acl.56/
DOI:: 10.18653/v1/2025.findings-acl.56
Bibkey:
Cite (ACL):: Ziyang Ma, Xiquan Li, Yakun Song, Wenxi Chen, Chenpeng Du, Jian Wu, Yuanzhe Chen, Zhuo Chen, Yuping Wang, Yuxuan Wang, and Xie Chen. 2025. Towards Reliable Large Audio Language Model. In Findings of the Association for Computational Linguistics: ACL 2025, pages 1000–1014, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Towards Reliable Large Audio Language Model (Ma et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.56.pdf

PDF Cite Search Fix data