Jochen Schwenninger


2014

pdf bib
Exploiting the large-scale German Broadcast Corpus to boost the Fraunhofer IAIS Speech Recognition System
Michael Stadtschnitzer | Jochen Schwenninger | Daniel Stein | Joachim Koehler
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we describe the large-scale German broadcast corpus (GER-TV1000h) containing more than 1,000 hours of transcribed speech data. This corpus is unique in the German language corpora domain and enables significant progress in tuning the acoustic modelling of German large vocabulary continuous speech recognition (LVCSR) systems. The exploitation of this huge broadcast corpus is demonstrated by optimizing and improving the Fraunhofer IAIS speech recognition system. Due to the availability of huge amount of acoustic training data new training strategies are investigated. The performance of the automatic speech recognition (ASR) system is evaluated on several datasets and compared to previously published results. It can be shown that the word error rate (WER) using a larger corpus can be reduced by up to 9.1 % relative. By using both larger corpus and recent training paradigms the WER was reduced by up to 35.8 % relative and below 40 % absolute even for spontaneous dialectal speech in noisy conditions, making the ASR output a useful resource for subsequent tasks like named entity recognition also in difficult acoustic situations.

2010

pdf bib
DiSCo - A German Evaluation Corpus for Challenging Problems in the Broadcast Domain
Doris Baum | Daniel Schneider | Rolf Bardeli | Jochen Schwenninger | Barbara Samlowski | Thomas Winkler | Joachim Köhler
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Typical broadcast material contains not only studio-recorded texts read by trained speakers, but also spontaneous and dialect speech, debates with cross-talk, voice-overs, and on-site reports with difficult acoustic environments. Standard approaches to speech and speaker recognition usually deteriorate under such conditions. This paper reports on the design, construction, and experimental analysis of DiSCo, a German corpus for the evaluation of speech and speaker recognition on challenging material from the broadcast domain. One of the key requirements for the design of this corpus was a good coverage of different types of serious programmes beyond clean speech and planned speech broadcast news. Corpus annotation encompasses manual segmentation, an orthographic transcription, and labelling with speech mode, dialect, and noise type. We indicate typical use cases for the corpus by reporting results from ASR, speech search, and speaker recognition on the new corpus, thereby obtaining insights into the difficulty of audio recognition on the various classes.