A Corpus of Read and Spontaneous Upper Saxon German Speech for ASR Evaluation

Robert Herms; Laura Seelig; Stefanie Münch; Maximilian Eibl

A Corpus of Read and Spontaneous Upper Saxon German Speech for ASR Evaluation

Robert Herms, Laura Seelig, Stefanie Münch, Maximilian Eibl

Abstract

In this Paper we present a corpus named SXUCorpus which contains read and spontaneous speech of the Upper Saxon German dialect. The data has been collected from eight archives of local television stations located in the Free State of Saxony. The recordings include broadcasted topics of news, economy, weather, sport, and documentation from the years 1992 to 1996 and have been manually transcribed and labeled. In the paper, we report the methodology of collecting and processing analog audiovisual material, constructing the corpus and describe the properties of the data. In its current version, the corpus is available to the scientific community and is designed for automatic speech recognition (ASR) evaluation with a development set and a test set. We performed ASR experiments with the open-source framework sphinx-4 including a configuration for Standard German on the dataset. Additionally, we show the influence of acoustic model and language model adaptation by the utilization of the development set.

Anthology ID:: L16-1736
Volume:: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:: May
Year:: 2016
Address:: Portorož, Slovenia
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 4648–4651
Language:
URL:: https://aclanthology.org/L16-1736/
DOI:
Bibkey:
Cite (ACL):: Robert Herms, Laura Seelig, Stefanie Münch, and Maximilian Eibl. 2016. A Corpus of Read and Spontaneous Upper Saxon German Speech for ASR Evaluation. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 4648–4651, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):: A Corpus of Read and Spontaneous Upper Saxon German Speech for ASR Evaluation (Herms et al., LREC 2016)
Copy Citation:
PDF:: https://aclanthology.org/L16-1736.pdf

PDF Cite Search Fix data