PIAST: A Multimodal Piano Dataset with Audio, Symbolic and Text

Hayeon Bang, Eunjin Choi, Megan Finch, Seungheon Doh, Seolhee Lee, Gyeong-Hoon Lee, Juhan Nam


Abstract
While piano music has become a significant area of study in Music Information Retrieval (MIR), there is a notable lack of datasets for piano solo music with text labels. To address this gap, we present PIAST (PIano dataset with Audio, Symbolic, and Text), a piano music dataset. Utilizing a piano-specific taxonomy of semantic tags, we collected 9,673 tracks from YouTube and added human annotations for 2,023 tracks by music experts, resulting in two subsets: PIAST-YT and PIAST-AT. Both include audio, text, tag annotations, and transcribed MIDI utilizing state-of-the-art piano transcription and beat tracking models. Among many possible tasks with the multimodal dataset, we conduct music tagging and retrieval using both audio and MIDI data and report baseline performances to demonstrate its potential as a valuable resource for MIR research.
Anthology ID:
2024.nlp4musa-1.2
Volume:
Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA)
Month:
November
Year:
2024
Address:
Oakland, USA
Editors:
Anna Kruspe, Sergio Oramas, Elena V. Epure, Mohamed Sordo, Benno Weck, SeungHeon Doh, Minz Won, Ilaria Manco, Gabriel Meseguer-Brocal
Venues:
NLP4MusA | WS
SIG:
Publisher:
Association for Computational Lingustics
Note:
Pages:
5–10
Language:
URL:
https://aclanthology.org/2024.nlp4musa-1.2/
DOI:
Bibkey:
Cite (ACL):
Hayeon Bang, Eunjin Choi, Megan Finch, Seungheon Doh, Seolhee Lee, Gyeong-Hoon Lee, and Juhan Nam. 2024. PIAST: A Multimodal Piano Dataset with Audio, Symbolic and Text. In Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA), pages 5–10, Oakland, USA. Association for Computational Lingustics.
Cite (Informal):
PIAST: A Multimodal Piano Dataset with Audio, Symbolic and Text (Bang et al., NLP4MusA 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nlp4musa-1.2.pdf