Meinard Müller


2024

pdf bib
Lyrics Transcription in Western Classical Music with Whisper: A Case Study on Schubert’s Winterreise
Hans-Ulrich Berendes | Simon Schwär | Meinard Müller
Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA)

Automatic Lyrics Transcription (ALT) aims to transcribe sung words from music recordings and is closely related to Automatic Speech Recognition (ASR). Although not specifically designed for lyrics transcription, the state-of-the-art ASR model Whisper has recently proven effective for ALT and various related tasks in music information retrieval (MIR). This paper investigates Whisper’s performance on Western classical music, using the “Schubert Winterreise Dataset.” In particular, we found that the average Word Error Rate (WER) with the unmodified Whisper model is 0.56 for this dataset, while the performance varies greatly across songs and versions. In contrast, spoken versions of the song lyrics, which we recorded, are transcribed with a WER of 0.14. Further systematic experiments with source separation and time-scale modification techniques indicate that Whisper’s accuracy in lyrics transcription is less affected by the musical accompaniment and more by the singing style.