The 2014 KIT IWSLT speech-to-text systems for English, German and Italian

Kevin Kilgour, Michael Heck, Markus Müller, Matthias Sperber, Sebastian Stüker, Alex Waibel


Abstract
This paper describes our German, Italian and English Speech-to-Text (STT) systems for the 2014 IWSLT TED ASR track. Our setup uses ROVER and confusion network combination from various subsystems to achieve a good overall performance. The individual subsystems are built by using different front-ends, (e.g., MVDR-MFCC or lMel), acoustic models (GMM or modular DNN) and phone sets and by training on various subsets of the training data. Decoding is performed in two stages, where the GMM systems are adapted in an unsupervised manner on the combination of the first stage outputs using VTLN, MLLR, and cMLLR. The combination setup produces a final hypothesis that has a significantly lower WER than any of the individual subsystems.
Anthology ID:
2014.iwslt-evaluation.9
Volume:
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign
Month:
December 4-5
Year:
2014
Address:
Lake Tahoe, California
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Note:
Pages:
73–79
Language:
URL:
https://aclanthology.org/2014.iwslt-evaluation.9
DOI:
Bibkey:
Cite (ACL):
Kevin Kilgour, Michael Heck, Markus Müller, Matthias Sperber, Sebastian Stüker, and Alex Waibel. 2014. The 2014 KIT IWSLT speech-to-text systems for English, German and Italian. In Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign, pages 73–79, Lake Tahoe, California.
Cite (Informal):
The 2014 KIT IWSLT speech-to-text systems for English, German and Italian (Kilgour et al., IWSLT 2014)
Copy Citation:
PDF:
https://aclanthology.org/2014.iwslt-evaluation.9.pdf