Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey

Chih-Kai Yang; Neo S. Ho; Hung-yi Lee

doi:10.18653/v1/2025.emnlp-main.514

Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey

Abstract

With advancements in large audio-language models (LALMs), which enhance large language models (LLMs) with auditory capabilities, these models are expected to demonstrate universal proficiency across various auditory tasks. While numerous benchmarks have emerged to assess LALMs’ performance, they remain fragmented and lack a structured taxonomy. To bridge this gap, we conduct a comprehensive survey and propose a systematic taxonomy for LALM evaluations, categorizing them into four dimensions based on their objectives: (1) General Auditory Awareness and Processing, (2) Knowledge and Reasoning, (3) Dialogue-oriented Ability, and (4) Fairness, Safety, and Trustworthiness. We provide detailed overviews within each category and highlight challenges in this field, offering insights into promising future directions. To the best of our knowledge, this is the first survey specifically focused on the evaluations of LALMs, providing clear guidelines for the community.

Anthology ID:: 2025.emnlp-main.514
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10144–10170
Language:
URL:: https://aclanthology.org/2025.emnlp-main.514/
DOI:: 10.18653/v1/2025.emnlp-main.514
Bibkey:
Cite (ACL):: Chih-Kai Yang, Neo S. Ho, and Hung-yi Lee. 2025. Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 10144–10170, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey (Yang et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.514.pdf
Checklist:: 2025.emnlp-main.514.checklist.pdf

PDF Cite Search Checklist Fix data