Jiajia Li
2024
The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models
Jiajia Li
|
Lu Yang
|
Mingni Tang
|
Chenchong Chenchong
|
Zuchao Li
|
Ping Wang
|
Hai Zhao
Findings of the Association for Computational Linguistics: ACL 2024
Benchmark plays a pivotal role in assessing the advancements of large language models (LLMs). While numerous benchmarks have been proposed to evaluate LLMs’ capabilities, there is a notable absence of a dedicated benchmark for assessing their musical abilities. To address this gap, we present ZIQI-Eval, a comprehensive and large-scale music benchmark specifically designed to evaluate the music-related capabilities of LLMs.ZIQI-Eval encompasses a wide range of questions, covering 10 major categories and 56 subcategories, resulting in over 14,000 meticulously curated data entries. By leveraging ZIQI-Eval, we conduct a comprehensive evaluation over 16 LLMs to evaluate and analyze LLMs’ performance in the domain of music.Results indicate that all LLMs perform poorly on the ZIQI-Eval benchmark, suggesting significant room for improvement in their musical capabilities.With ZIQI-Eval, we aim to provide a standardized and robust evaluation framework that facilitates a comprehensive assessment of LLMs’ music-related abilities. The dataset is available at GitHub and HuggingFace.
Search
Co-authors
- Lu Yang 1
- Mingni Tang 1
- Chenchong Chenchong 1
- Zuchao Li 1
- Ping Wang 1
- show all...
- Hai Zhao 1