Zachary Novack
2024
FUTGA: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation
Junda Wu
|
Zachary Novack
|
Amit Namburi
|
Jiaheng Dai
|
Hao-Wen Dong
|
Zhouhang Xie
|
Carol Chen
|
Julian McAuley
Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA)
We propose FUTGA, a model equipped with fined-grained music understanding capabilities through learning from generative augmentation with temporal compositions. We leverage existing music caption datasets and large language models (LLMs) to synthesize fine-grained music captions with structural descriptions and time boundaries for full-length songs. Augmented by the proposed synthetic dataset, FUTGA is enabled to identify the music’s temporal changes at key transition points and their musical functions, as well as generate detailed descriptions for each music segment. We further introduce a full-length music caption dataset generated by FUTGA, as the augmentation of the MusicCaps and the Song Describer datasets. The experiments demonstrate the better quality of the generated captions, which capture the time boundaries of long-form music.
Search
Fix data
Co-authors
- Carol Chen 1
- Jiaheng Dai 1
- Hao-Wen Dong 1
- Julian McAuley 1
- Amit Namburi 1
- show all...