FUTGA: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation

Junda Wu, Zachary Novack, Amit Namburi, Jiaheng Dai, Hao-Wen Dong, Zhouhang Xie, Carol Chen, Julian McAuley


Abstract
We propose FUTGA, a model equipped with fined-grained music understanding capabilities through learning from generative augmentation with temporal compositions. We leverage existing music caption datasets and large language models (LLMs) to synthesize fine-grained music captions with structural descriptions and time boundaries for full-length songs. Augmented by the proposed synthetic dataset, FUTGA is enabled to identify the music’s temporal changes at key transition points and their musical functions, as well as generate detailed descriptions for each music segment. We further introduce a full-length music caption dataset generated by FUTGA, as the augmentation of the MusicCaps and the Song Describer datasets. The experiments demonstrate the better quality of the generated captions, which capture the time boundaries of long-form music.
Anthology ID:
2024.nlp4musa-1.17
Volume:
Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA)
Month:
November
Year:
2024
Address:
Oakland, USA
Editors:
Anna Kruspe, Sergio Oramas, Elena V. Epure, Mohamed Sordo, Benno Weck, SeungHeon Doh, Minz Won, Ilaria Manco, Gabriel Meseguer-Brocal
Venues:
NLP4MusA | WS
SIG:
Publisher:
Association for Computational Lingustics
Note:
Pages:
107–111
Language:
URL:
https://aclanthology.org/2024.nlp4musa-1.17/
DOI:
Bibkey:
Cite (ACL):
Junda Wu, Zachary Novack, Amit Namburi, Jiaheng Dai, Hao-Wen Dong, Zhouhang Xie, Carol Chen, and Julian McAuley. 2024. FUTGA: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation. In Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA), pages 107–111, Oakland, USA. Association for Computational Lingustics.
Cite (Informal):
FUTGA: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation (Wu et al., NLP4MusA 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nlp4musa-1.17.pdf