Lyrics Transcription in Western Classical Music with Whisper: A Case Study on Schubert’s Winterreise

Hans-Ulrich Berendes; Simon Schwär; Meinard Müller

Lyrics Transcription in Western Classical Music with Whisper: A Case Study on Schubert’s Winterreise

Hans-Ulrich Berendes, Simon Schwär, Meinard Müller

Abstract

Automatic Lyrics Transcription (ALT) aims to transcribe sung words from music recordings and is closely related to Automatic Speech Recognition (ASR). Although not specifically designed for lyrics transcription, the state-of-the-art ASR model Whisper has recently proven effective for ALT and various related tasks in music information retrieval (MIR). This paper investigates Whisper’s performance on Western classical music, using the “Schubert Winterreise Dataset.” In particular, we found that the average Word Error Rate (WER) with the unmodified Whisper model is 0.56 for this dataset, while the performance varies greatly across songs and versions. In contrast, spoken versions of the song lyrics, which we recorded, are transcribed with a WER of 0.14. Further systematic experiments with source separation and time-scale modification techniques indicate that Whisper’s accuracy in lyrics transcription is less affected by the musical accompaniment and more by the singing style.

Anthology ID:: 2024.nlp4musa-1.3
Volume:: Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA)
Month:: November
Year:: 2024
Address:: Oakland, USA
Editors:: Anna Kruspe, Sergio Oramas, Elena V. Epure, Mohamed Sordo, Benno Weck, SeungHeon Doh, Minz Won, Ilaria Manco, Gabriel Meseguer-Brocal
Venues:: NLP4MusA | WS
SIG:
Publisher:: Association for Computational Lingustics
Note:
Pages:: 11–16
Language:
URL:: https://aclanthology.org/2024.nlp4musa-1.3/
DOI:
Bibkey:
Cite (ACL):: Hans-Ulrich Berendes, Simon Schwär, and Meinard Müller. 2024. Lyrics Transcription in Western Classical Music with Whisper: A Case Study on Schubert’s Winterreise. In Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA), pages 11–16, Oakland, USA. Association for Computational Lingustics.
Cite (Informal):: Lyrics Transcription in Western Classical Music with Whisper: A Case Study on Schubert’s Winterreise (Berendes et al., NLP4MusA 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.nlp4musa-1.3.pdf

PDF Cite Search Fix data