The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models

Jiajia Li; Lu Yang; Mingni Tang; Cong Chen; Zuchao Li; Ping Wang; Hai Zhao

doi:10.18653/v1/2024.findings-acl.194

The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models

Jiajia Li, Lu Yang, Mingni Tang, Cong Chen, Zuchao Li, Ping Wang, Hai Zhao

Abstract

Benchmark plays a pivotal role in assessing the advancements of large language models (LLMs). While numerous benchmarks have been proposed to evaluate LLMs’ capabilities, there is a notable absence of a dedicated benchmark for assessing their musical abilities. To address this gap, we present ZIQI-Eval, a comprehensive and large-scale music benchmark specifically designed to evaluate the music-related capabilities of LLMs.ZIQI-Eval encompasses a wide range of questions, covering 10 major categories and 56 subcategories, resulting in over 14,000 meticulously curated data entries. By leveraging ZIQI-Eval, we conduct a comprehensive evaluation over 16 LLMs to evaluate and analyze LLMs’ performance in the domain of music.Results indicate that all LLMs perform poorly on the ZIQI-Eval benchmark, suggesting significant room for improvement in their musical capabilities.With ZIQI-Eval, we aim to provide a standardized and robust evaluation framework that facilitates a comprehensive assessment of LLMs’ music-related abilities. The dataset is available at GitHub and HuggingFace.

Anthology ID:: 2024.findings-acl.194
Volume:: Findings of the Association for Computational Linguistics: ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3246–3257
Language:
URL:: https://aclanthology.org/2024.findings-acl.194/
DOI:: 10.18653/v1/2024.findings-acl.194
Bibkey:
Cite (ACL):: Jiajia Li, Lu Yang, Mingni Tang, Cong Chen, Zuchao Li, Ping Wang, and Hai Zhao. 2024. The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2024, pages 3246–3257, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models (Li et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-acl.194.pdf

PDF Cite Search Fix data