Can Impressions of Music be Extracted from Thumbnail Images?

Takashi Harada; Takehiro Motomitsu; Katsuhiko Hayashi; Yusuke Sakai; Hidetaka Kamigaito

Can Impressions of Music be Extracted from Thumbnail Images?

Takashi Harada, Takehiro Motomitsu, Katsuhiko Hayashi, Yusuke Sakai, Hidetaka Kamigaito

Abstract

In recent years, there has been a notable increase in research on machine learning models for music retrieval and generation systems that are capable of taking natural language sentences as inputs. However, there is a scarcity of large-scale publicly available datasets, consisting of music data and their corresponding natural language descriptions known as music captions. In particular, non-musical information such as suitable situations for listening to a track and the emotions elicited upon listening is crucial for describing music. This type of information is underrepresented in existing music caption datasets due to the challenges associated with extracting it directly from music data. To address this issue, we propose a method for generating music caption data that incorporates non-musical aspects inferred from music thumbnail images, and validated the effectiveness of our approach through human evaluations.

Anthology ID:: 2024.nlp4musa-1.9
Volume:: Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA)
Month:: November
Year:: 2024
Address:: Oakland, USA
Editors:: Anna Kruspe, Sergio Oramas, Elena V. Epure, Mohamed Sordo, Benno Weck, SeungHeon Doh, Minz Won, Ilaria Manco, Gabriel Meseguer-Brocal
Venues:: NLP4MusA | WS
SIG:
Publisher:: Association for Computational Lingustics
Note:
Pages:: 49–56
Language:
URL:: https://aclanthology.org/2024.nlp4musa-1.9/
DOI:
Bibkey:
Cite (ACL):: Takashi Harada, Takehiro Motomitsu, Katsuhiko Hayashi, Yusuke Sakai, and Hidetaka Kamigaito. 2024. Can Impressions of Music be Extracted from Thumbnail Images?. In Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA), pages 49–56, Oakland, USA. Association for Computational Lingustics.
Cite (Informal):: Can Impressions of Music be Extracted from Thumbnail Images? (Harada et al., NLP4MusA 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.nlp4musa-1.9.pdf

PDF Cite Search Fix data