Sebastian Stüker

Also published as: Sebastian Stueker, Sebastian Stuker


2021

pdf bib
ELITR Multilingual Live Subtitling: Demo and Strategy
Ondřej Bojar | Dominik Macháček | Sangeet Sagar | Otakar Smrž | Jonáš Kratochvíl | Peter Polák | Ebrahim Ansari | Mohammad Mahmoudi | Rishu Kumar | Dario Franceschini | Chiara Canton | Ivan Simonini | Thai-Son Nguyen | Felix Schneider | Sebastian Stüker | Alex Waibel | Barry Haddow | Rico Sennrich | Philip Williams
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

This paper presents an automatic speech translation system aimed at live subtitling of conference presentations. We describe the overall architecture and key processing components. More importantly, we explain our strategy for building a complex system for end-users from numerous individual components, each of which has been tested only in laboratory conditions. The system is a working prototype that is routinely tested in recognizing English, Czech, and German speech and presenting it translated simultaneously into 42 target languages.

pdf bib
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)
Marcello Federico | Alex Waibel | Marta R. Costa-jussà | Jan Niehues | Sebastian Stuker | Elizabeth Salesky
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

pdf bib
FINDINGS OF THE IWSLT 2021 EVALUATION CAMPAIGN
Antonios Anastasopoulos | Ondřej Bojar | Jacob Bremerman | Roldano Cattoni | Maha Elbayad | Marcello Federico | Xutai Ma | Satoshi Nakamura | Matteo Negri | Jan Niehues | Juan Pino | Elizabeth Salesky | Sebastian Stüker | Katsuhito Sudoh | Marco Turchi | Alexander Waibel | Changhan Wang | Matthew Wiesner
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2021) featured this year four shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Multilingual speech translation, (iv) Low-resource speech translation. A total of 22 teams participated in at least one of the tasks. This paper describes each shared task, data and evaluation metrics, and reports results of the received submissions.

pdf bib
KIT’s IWSLT 2021 Offline Speech Translation System
Tuan Nam Nguyen | Thai Son Nguyen | Christian Huber | Ngoc-Quan Pham | Thanh-Le Ha | Felix Schneider | Sebastian Stüker
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper describes KIT’submission to the IWSLT 2021 Offline Speech Translation Task. We describe a system in both cascaded condition and end-to-end condition. In the cascaded condition, we investigated different end-to-end architectures for the speech recognition module. For the text segmentation module, we trained a small transformer-based model on high-quality monolingual data. For the translation module, our last year’s neural machine translation model was reused. In the end-to-end condition, we improved our Speech Relative Transformer architecture to reach or even surpass the result of the cascade system.

pdf bib
Multilingual Speech Translation KIT @ IWSLT2021
Ngoc-Quan Pham | Tuan Nam Nguyen | Thanh-Le Ha | Sebastian Stüker | Alexander Waibel | Dan He
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper contains the description for the submission of Karlsruhe Institute of Technology (KIT) for the multilingual TEDx translation task in the IWSLT 2021 evaluation campaign. Our main approach is to develop both cascade and end-to-end systems and eventually combine them together to achieve the best possible results for this extremely low-resource setting. The report also confirms certain consistent architectural improvement added to the Transformer architecture, for all tasks: translation, transcription and speech translation.

2020

pdf bib
Supervised Adaptation of Sequence-to-Sequence Speech Recognition Systems using Batch-Weighting
Christian Huber | Juan Hussain | Tuan-Nam Nguyen | Kaihang Song | Sebastian Stüker | Alexander Waibel
Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems

When training speech recognition systems, one often faces the situation that sufficient amounts of training data for the language in question are available but only small amounts of data for the domain in question. This problem is even bigger for end-to-end speech recognition systems that only accept transcribed speech as training data, which is harder and more expensive to obtain than text data. In this paper we present experiments in adapting end-to-end speech recognition systems by a method which is called batch-weighting and which we contrast against regular fine-tuning, i.e., to continue to train existing neural speech recognition models on adaptation data. We perform experiments using theses techniques in adapting to topic, accent and vocabulary, showing that batch-weighting consistently outperforms fine-tuning. In order to show the generalization capabilities of batch-weighting we perform experiments in several languages, i.e., Arabic, English and German. Due to its relatively small computational requirements batch-weighting is a suitable technique for supervised life-long learning during the life-time of a speech recognition system, e.g., from user corrections.

pdf bib
German-Arabic Speech-to-Speech Translation for Psychiatric Diagnosis
Juan Hussain | Mohammed Mediani | Moritz Behr | M. Amin Cheragui | Sebastian Stüker | Alexander Waibel
Proceedings of the Fifth Arabic Natural Language Processing Workshop

In this paper we present the natural language processing components of our German-Arabic speech-to-speech translation system which is being deployed in the context of interpretation during psychiatric, diagnostic interviews. For this purpose we have built a pipe-lined speech-to-speech translation system consisting of automatic speech recognition, text post-processing/segmentation, machine translation and speech synthesis systems. We have implemented two pipe-lines, from German to Arabic and Arabic to German, in order to be able to conduct interpreted two-way dialogues between psychiatrists and potential patients. All systems in our pipeline have been realized as all-neural end-to-end systems, using different architectures suitable for the different components. The speech recognition systems use an encoder/decoder + attention architecture, the text segmentation component and the machine translation system are based on the Transformer architecture, and for the speech synthesis systems we use Tacotron 2 for generating spectrograms and WaveGlow as vocoder. The speech translation is deployed in a server-based speech translation application that implements a turn based translation between a German speaking psychiatrist administrating the Mini-International Neuropsychiatric Interview (M.I.N.I.) and an Arabic speaking person answering the interview. As this is a very specific domain, in addition to the linguistic challenges posed by translating between Arabic and German, we also focus in this paper on the methods we implemented for adapting our speech translation system to the domain of this psychiatric interview.

pdf bib
DaCToR: A Data Collection Tool for the RELATER Project
Juan Hussain | Oussama Zenkri | Sebastian Stüker | Alex Waibel
Proceedings of the 12th Language Resources and Evaluation Conference

Collecting domain-specific data for under-resourced languages, e.g., dialects of languages, can be very expensive, potentially financially prohibitive and taking long time. Moreover, in the case of rarely written languages, the normalization of non-canonical transcription might be another time consuming but necessary task. In order to collect domain-specific data in such circumstances in a time and cost-efficient way, collecting read data of pre-prepared texts is often a viable option. In order to collect data in the domain of psychiatric diagnosis in Arabic dialects for the project RELATER, we have prepared the data collection tool DaCToR for collecting read texts by speakers in the respective countries and districts in which the dialects are spoken. In this paper we describe our tool, its purpose within the project RELATER and the dialects which we have started to collect with the tool.

pdf bib
Proceedings of the 17th International Conference on Spoken Language Translation
Marcello Federico | Alex Waibel | Kevin Knight | Satoshi Nakamura | Hermann Ney | Jan Niehues | Sebastian Stüker | Dekai Wu | Joseph Mariani | Francois Yvon
Proceedings of the 17th International Conference on Spoken Language Translation

pdf bib
FINDINGS OF THE IWSLT 2020 EVALUATION CAMPAIGN
Ebrahim Ansari | Amittai Axelrod | Nguyen Bach | Ondřej Bojar | Roldano Cattoni | Fahim Dalvi | Nadir Durrani | Marcello Federico | Christian Federmann | Jiatao Gu | Fei Huang | Kevin Knight | Xutai Ma | Ajay Nagesh | Matteo Negri | Jan Niehues | Juan Pino | Elizabeth Salesky | Xing Shi | Sebastian Stüker | Marco Turchi | Alexander Waibel | Changhan Wang
Proceedings of the 17th International Conference on Spoken Language Translation

The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2020) featured this year six challenge tracks: (i) Simultaneous speech translation, (ii) Video speech translation, (iii) Offline speech translation, (iv) Conversational speech translation, (v) Open domain translation, and (vi) Non-native speech translation. A total of teams participated in at least one of the tracks. This paper introduces each track’s goal, data and evaluation metrics, and reports the results of the received submissions.

pdf bib
KIT’s IWSLT 2020 SLT Translation System
Ngoc-Quan Pham | Felix Schneider | Tuan-Nam Nguyen | Thanh-Le Ha | Thai Son Nguyen | Maximilian Awiszus | Sebastian Stüker | Alexander Waibel
Proceedings of the 17th International Conference on Spoken Language Translation

This paper describes KIT’s submissions to the IWSLT2020 Speech Translation evaluation campaign. We first participate in the simultaneous translation task, in which our simultaneous models are Transformer based and can be efficiently trained to obtain low latency with minimized compromise in quality. On the offline speech translation task, we applied our new Speech Transformer architecture to end-to-end speech translation. The obtained model can provide translation quality which is competitive to a complicated cascade. The latter still has the upper hand, thanks to the ability to transparently access to the transcription, and resegment the inputs to avoid fragmentation.

pdf bib
Removing European Language Barriers with Innovative Machine Translation Technology
Dario Franceschini | Chiara Canton | Ivan Simonini | Armin Schweinfurth | Adelheid Glott | Sebastian Stüker | Thai-Son Nguyen | Felix Schneider | Thanh-Le Ha | Alex Waibel | Barry Haddow | Philip Williams | Rico Sennrich | Ondřej Bojar | Sangeet Sagar | Dominik Macháček | Otakar Smrž
Proceedings of the 1st International Workshop on Language Technology Platforms

This paper presents our progress towards deploying a versatile communication platform in the task of highly multilingual live speech translation for conferences and remote meetings live subtitling. The platform has been designed with a focus on very low latency and high flexibility while allowing research prototypes of speech and text processing tools to be easily connected, regardless of where they physically run. We outline our architecture solution and also briefly compare it with the ELG platform. Technical details are provided on the most important components and we summarize the test deployment events we ran so far.

2018

pdf bib
KIT Lecture Translator: Multilingual Speech Translation with One-Shot Learning
Florian Dessloch | Thanh-Le Ha | Markus Müller | Jan Niehues | Thai-Son Nguyen | Ngoc-Quan Pham | Elizabeth Salesky | Matthias Sperber | Sebastian Stüker | Thomas Zenkel | Alexander Waibel
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

In today’s globalized world we have the ability to communicate with people across the world. However, in many situations the language barrier still presents a major issue. For example, many foreign students coming to KIT to study are initially unable to follow a lecture in German. Therefore, we offer an automatic simultaneous interpretation service for students. To fulfill this task, we have developed a low-latency translation system that is adapted to lectures and covers several language pairs. While the switch from traditional Statistical Machine Translation to Neural Machine Translation (NMT) significantly improved performance, to integrate NMT into the speech translation framework required several adjustments. We have addressed the run-time constraints and different types of input. Furthermore, we utilized one-shot learning to easily add new topic-specific terms to the system. Besides better performance, NMT also enabled us increase our covered languages through multilingual NMT. % Combining these techniques, we are able to provide an adapted speech translation system for several European languages.

pdf bib
A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments
Pierre Godard | Gilles Adda | Martine Adda-Decker | Juan Benjumea | Laurent Besacier | Jamison Cooper-Leavitt | Guy-Noel Kouarata | Lori Lamel | Hélène Maynard | Markus Mueller | Annie Rialland | Sebastian Stueker | François Yvon | Marcely Zanon-Boito
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
BULBasaa: A Bilingual Basaa-French Speech Corpus for the Evaluation of Language Documentation Tools
Fatima Hamlaoui | Emmanuel-Moselly Makasso | Markus Müller | Jonas Engelmann | Gilles Adda | Alex Waibel | Sebastian Stüker
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
Lightly Supervised Quality Estimation
Matthias Sperber | Graham Neubig | Jan Niehues | Sebastian Stüker | Alex Waibel
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Evaluating the quality of output from language processing systems such as machine translation or speech recognition is an essential step in ensuring that they are sufficient for practical use. However, depending on the practical requirements, evaluation approaches can differ strongly. Often, reference-based evaluation measures (such as BLEU or WER) are appealing because they are cheap and allow rapid quantitative comparison. On the other hand, practitioners often focus on manual evaluation because they must deal with frequently changing domains and quality standards requested by customers, for which reference-based evaluation is insufficient or not possible due to missing in-domain reference data (Harris et al., 2016). In this paper, we attempt to bridge this gap by proposing a framework for lightly supervised quality estimation. We collect manually annotated scores for a small number of segments in a test corpus or document, and combine them with automatically predicted quality scores for the remaining segments to predict an overall quality estimate. An evaluation shows that our framework estimates quality more reliably than using fully automatic quality estimation approaches, while keeping annotation effort low by not requiring full references to be available for the particular domain.

pdf bib
Lecture Translator - Speech translation framework for simultaneous lecture translation
Markus Müller | Thai Son Nguyen | Jan Niehues | Eunah Cho | Bastian Krüger | Thanh-Le Ha | Kevin Kilgour | Matthias Sperber | Mohammed Mediani | Sebastian Stüker | Alex Waibel
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

pdf bib
Evaluation of the KIT Lecture Translation System
Markus Müller | Sarah Fünfer | Sebastian Stüker | Alex Waibel
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

To attract foreign students is among the goals of the Karlsruhe Institute of Technology (KIT). One obstacle to achieving this goal is that lectures at KIT are usually held in German which many foreign students are not sufficiently proficient in, as, e.g., opposed to English. While the students from abroad are learning German during their stay at KIT, it is challenging to become proficient enough in it in order to follow a lecture. As a solution to this problem we offer our automatic simultaneous lecture translation. It translates German lectures into English in real time. While not as good as human interpreters, the system is available at a price that KIT can afford in order to offer it in potentially all lectures. In order to assess whether the quality of the system we have conducted a user study. In this paper we present this study, the way it was conducted and its results. The results indicate that the quality of the system has passed a threshold as to be able to support students in their studies. The study has helped to identify the most crucial weaknesses of the systems and has guided us which steps to take next.

2015

pdf bib
The IWSLT 2015 Evaluation Campaign
Mauro Cettolo | Jan Niehues | Sebastian Stüker | Luisa Bentivogli | Roldano Cattoni | Marcello Federico
Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
The 2015 KIT IWSLT speech-to-text systems for English and German
Markus Mueller | Tai Son Nguyen | Matthias Sperber | Kevin Kilgour | Sebastian Stuker | Alex Waibel
Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
Evaluation of Crowdsourced User Input Data for Spoken Dialog Systems
Maria Schmidt | Markus Müller | Martin Wagner | Sebastian Stüker | Alex Waibel | Hansjörg Hofmann | Steffen Werner
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2014

pdf bib
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign
Marcello Federico | Sebastian Stüker | François Yvon
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
Report on the 11th IWSLT evaluation campaign
Mauro Cettolo | Jan Niehues | Sebastian Stüker | Luisa Bentivogli | Marcello Federico
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign

The paper overviews the 11th evaluation campaign organized by the IWSLT workshop. The 2014 evaluation offered multiple tracks on lecture transcription and translation based on the TED Talks corpus. In particular, this year IWSLT included three automatic speech recognition tracks, on English, German and Italian, five speech translation tracks, from English to French, English to German, German to English, English to Italian, and Italian to English, and five text translation track, also from English to French, English to German, German to English, English to Italian, and Italian to English. In addition to the official tracks, speech and text translation optional tracks were offered, globally involving 12 other languages: Arabic, Spanish, Portuguese (B), Hebrew, Chinese, Polish, Persian, Slovenian, Turkish, Dutch, Romanian, Russian. Overall, 21 teams participated in the evaluation, for a total of 76 primary runs submitted. Participants were also asked to submit runs on the 2013 test set (progress test set), in order to measure the progress of systems with respect to the previous year. All runs were evaluated with objective metrics, and submissions for two of the official text translation tracks were also evaluated with human post-editing.

pdf bib
The 2014 KIT IWSLT speech-to-text systems for English, German and Italian
Kevin Kilgour | Michael Heck | Markus Müller | Matthias Sperber | Sebastian Stüker | Alex Waibel
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes our German, Italian and English Speech-to-Text (STT) systems for the 2014 IWSLT TED ASR track. Our setup uses ROVER and confusion network combination from various subsystems to achieve a good overall performance. The individual subsystems are built by using different front-ends, (e.g., MVDR-MFCC or lMel), acoustic models (GMM or modular DNN) and phone sets and by training on various subsets of the training data. Decoding is performed in two stages, where the GMM systems are adapted in an unsupervised manner on the combination of the first stage outputs using VTLN, MLLR, and cMLLR. The combination setup produces a final hypothesis that has a significantly lower WER than any of the individual subsystems.

pdf bib
Multilingual deep bottle neck features: a study on language selection and training techniques
Markus Müller | Sebastian Stüker | Zaid Sheikh | Florian Metze | Alex Waibel
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers

Previous work has shown that training the neural networks for bottle neck feature extraction in a multilingual way can lead to improvements in word error rate and average term weighted value in a telephone key word search task. In this work we conduct a systematic study on a) which multilingual training strategy to employ, b) the effect of language selection and amount of multilingual training data used and c) how to find a suitable combination for languages. We conducted our experiment on the key word search task and the languages of the IARPA BABEL program. In a first step, we assessed the performance of a single language out of all available languages in combination with the target language. Based on these results, we then combined a multitude of languages. We also examined the influence of the amount of training data per language, as well as different techniques for combining the languages during network training. Our experiments show that data from arbitrary additional languages does not necessarily increase the performance of a system. But when combining a suitable set of languages, a significant gain in performance can be achieved.

pdf bib
A Database of Freely Written Texts of German School Students for the Purpose of Automatic Spelling Error Classification
Kay Berkling | Johanna Fay | Masood Ghayoomi | Katrin Hein | Rémi Lavalley | Ludwig Linhuber | Sebastian Stüker
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The spelling competence of school students is best measured on freely written texts, instead of pre-determined, dictated texts. Since the analysis of the error categories in these kinds of texts is very labor intensive and costly, we are working on an automatic systems to perform this task. The modules of the systems are derived from techniques from the area of natural language processing, and are learning systems that need large amounts of training data. To obtain the data necessary for training and evaluating the resulting system, we conducted data collection of freely written, German texts by school children. 1,730 students from grade 1 through 8 participated in this data collection. The data was transcribed electronically and annotated with their corrected version. This resulted in a total of 14,563 sentences that can now be used for research regarding spelling diagnostics. Additional meta-data was collected regarding writers’ language biography, teaching methodology, age, gender, and school year. In order to do a detailed manual annotation of the categories of the spelling errors committed by the students we developed a tool specifically tailored to the task.

pdf bib
A Corpus of Spontaneous Speech in Lectures: The KIT Lecture Corpus for Spoken Language Processing and Translation
Eunah Cho | Sarah Fünfer | Sebastian Stüker | Alex Waibel
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

With the increasing number of applications handling spontaneous speech, the needs to process spoken languages become stronger. Speech disfluency is one of the most challenging tasks to deal with in automatic speech processing. As most applications are trained with well-formed, written texts, many issues arise when processing spontaneous speech due to its distinctive characteristics. Therefore, more data with annotated speech disfluencies will help the adaptation of natural language processing applications, such as machine translation systems. In order to support this, we have annotated speech disfluencies in German lectures at KIT. In this paper we describe how we annotated the disfluencies in the data and provide detailed statistics on the size of the corpus and the speakers. Moreover, machine translation performance on a source text including disfluencies is compared to the results of the translation of a source text without different sorts of disfluencies or no disfluencies at all.

2013

pdf bib
Report on the 10th IWSLT evaluation campaign
Mauro Cettolo | Jan Niehues | Sebastian Stüker | Luisa Bentivogli | Marcello Federico
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign

The paper overviews the tenth evaluation campaign organized by the IWSLT workshop. The 2013 evaluation offered multiple tracks on lecture transcription and translation based on the TED Talks corpus. In particular, this year IWSLT included two automatic speech recognition tracks, on English and German, three speech translation tracks, from English to French, English to German, and German to English, and three text translation track, also from English to French, English to German, and German to English. In addition to the official tracks, speech and text translation optional tracks were offered involving 12 other languages: Arabic, Spanish, Portuguese (B), Italian, Chinese, Polish, Persian, Slovenian, Turkish, Dutch, Romanian, Russian. Overall, 18 teams participated in the evaluation for a total of 217 primary runs submitted. All runs were evaluated with objective metrics on a current test set and two progress test sets, in order to compare the progresses against systems of the previous years. In addition, submissions of one of the official machine translation tracks were also evaluated with human post-editing.

pdf bib
The 2013 KIT IWSLT speech-to-text systems for German and English
Kevin Kilgour | Christian Mohr | Michael Heck | Quoc Bao Nguyen | Van Huy Nguyen | Evgeniy Shin | Igor Tseyzer | Jonas Gehring | Markus Müller | Matthias Sperber | Sebastian Stüker | Alex Waibel
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes our English Speech-to-Text (STT) systems for the 2013 IWSLT TED ASR track. The systems consist of multiple subsystems that are combinations of different front-ends, e.g. MVDR-MFCC based and lMel based ones, GMM and NN acoustic models and different phone sets. The outputs of the subsystems are combined via confusion network combination. Decoding is done in two stages, where the systems of the second stage are adapted in an unsupervised manner on the combination of the first stage outputs using VTLN, MLLR, and cMLLR.

pdf bib
The 2013 KIT Quaero speech-to-text system for French
Joshua Winebarger | Bao Nguyen | Jonas Gehring | Sebastian Stüker | Alex Waibel
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers

This paper describes our Speech-to-Text (STT) system for French, which was developed as part of our efforts in the Quaero program for the 2013 evaluation. Our STT system consists of six subsystems which were created by combining multiple complementary sources of pronunciation modeling including graphemes with various feature front-ends based on deep neural networks and tonal features. Both speaker-independent and speaker adaptively trained versions of the systems were built. The resulting systems were then combined via confusion network combination and crossadaptation. Through progressive advances and system combination we reach a word error rate (WER) of 16.5% on the 2012 Quaero evaluation data.

pdf bib
Incremental unsupervised training for university lecture recognition
Michael Heck | Sebastian Stüker | Sakriani Sakti | Alex Waibel | Satoshi Nakamura
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers

In this paper we describe our work on unsupervised adaptation of the acoustic model of our simultaneous lecture translation system. We trained a speaker independent acoustic model, with which we produce automatic transcriptions of new lectures in order to improve the system for a specific lecturer. We compare our results against a model that was trained in a supervised way on an exact manual transcription. We examine four different ways of processing the decoder outputs of the automatic transcription with respect to the treatment of pronunciation variants and noise words. We will show that, instead of fixating the latter informations in the transcriptions, it is of advantage to let the Viterbi algorithm during training decide which pronunciations to use and where to insert which noise words. Further, we utilize word level posterior probabilities obtained during decoding by weighting and thresholding the words of a transcription.

pdf bib
Maximum entropy language modeling for Russian ASR
Evgeniy Shin | Sebastian Stüker | Kevin Kilgour | Christian Fügen | Alex Waibel
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers

Russian is a challenging language for automatic speech recognition systems due to its rich morphology. This rich morphology stems from Russian’s highly inflectional nature and the frequent use of preand suffixes. Also, Russian has a very free word order, changes in which are used to reflect connotations of the sentences. Dealing with these phenomena is rather difficult for traditional n-gram models. We therefore investigate in this paper the use of a maximum entropy language model for Russian whose features are specifically designed to deal with the inflections in Russian, as well as the loose word order. We combine this with a subword based language model in order to alleviate the problem of large vocabulary sizes necessary for dealing with highly inflecting languages. Applying the maximum entropy language model during re-scoring improves the word error rate of our recognition system by 1.2% absolute, while the use of the sub-word based language model reduces the vocabulary size from 120k to 40k and the OOV rate from 4.8% to 2.1%.

2012

pdf bib
The KIT Lecture Corpus for Speech Translation
Sebastian Stüker | Florian Kraft | Christian Mohr | Teresa Herrmann | Eunah Cho | Alex Waibel
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Academic lectures offer valuable content, but often do not reach their full potential audience due to the language barrier. Human translations of lectures are too expensive to be widely used. Speech translation technology can be an affordable alternative in this case. State-of-the-art speech translation systems utilize statistical models that need to be trained on large amounts of in-domain data. In order to support the KIT lecture translation project in its effort to introduce speech translation technology in KIT's lecture halls, we have collected a corpus of German lectures at KIT. In this paper we describe how we recorded the lectures and how we annotated them. We further give detailed statistics on the types of lectures in the corpus and its size. We collected the corpus with the purpose in mind that it should not just be suited for training a spoken language translation system the traditional way, but should also enable us to research techniques that enable the translation system to automatically and autonomously adapt itself to the varying topics and speakers of lectures

pdf bib
The IWSLT 2011 Evaluation Campaign on Automatic Talk Translation
Marcello Federico | Sebastian Stüker | Luisa Bentivogli | Michael Paul | Mauro Cettolo | Teresa Herrmann | Jan Niehues | Giovanni Moretti
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We report here on the eighth evaluation campaign organized in 2011 by the IWSLT workshop series. That IWSLT 2011 evaluation focused on the automatic translation of public talks and included tracks for speech recognition, speech translation, text translation, and system combination. Unlike in previous years, all data supplied for the evaluation has been publicly released on the workshop website, and is at the disposal of researchers interested in working on our benchmarks and in comparing their results with those published at the workshop. This paper provides an overview of the IWSLT 2011 evaluation campaign, and describes the data supplied, the evaluation infrastructure made available to participants, and the subjective evaluation carried out.

pdf bib
The KIT-NAIST (contrastive) English ASR system for IWSLT 2012
Michael Heck | Keigo Kubo | Matthias Sperber | Sakriani Sakti | Sebastian Stüker | Christian Saam | Kevin Kilgour | Christian Mohr | Graham Neubig | Tomoki Toda | Satoshi Nakamura | Alex Waibel
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the KIT-NAIST (Contrastive) English speech recognition system for the IWSLT 2012 Evaluation Campaign. In particular, we participated in the ASR track of the IWSLT TED task. The system was developed by Karlsruhe Institute of Technology (KIT) and Nara Institute of Science and Technology (NAIST) teams in collaboration within the interACT project. We employ single system decoding with fully continuous and semi-continuous models, as well as a three-stage, multipass system combination framework built with the Janus Recognition Toolkit. On the IWSLT 2010 test set our single system introduced in this work achieves a WER of 17.6%, and our final combination achieves a WER of 14.4%.

pdf bib
Evaluation of interactive user corrections for lecture transcription
Heinrich Kolkhorst | Kevin Kilgour | Sebastian Stüker | Alex Waibel
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers

In this work, we present and evaluate the usage of an interactive web interface for browsing and correcting lecture transcripts. An experiment performed with potential users without transcription experience provides us with a set of example corrections. On German lecture data, user corrections greatly improve the comprehensibility of the transcripts, yet only reduce the WER to 22%. The precision of user edits is relatively low at 77% and errors in inflection, case and compounds were rarely corrected. Nevertheless, characteristic lecture data errors, such as highly specific terms, were typically corrected, providing valuable additional information.

2011

pdf bib
Overview of the IWSLT 2011 evaluation campaign
Marcello Federico | Luisa Bentivogli | Michael Paul | Sebastian Stüker
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

We report here on the eighth Evaluation Campaign organized by the IWSLT workshop. This year, the IWSLT evaluation focused on the automatic translation of public talks and included tracks for speech recognition, speech translation, text translation, and system combination. Unlike previous years, all data supplied for the evaluation has been publicly released on the workshop website, and is at the disposal of researchers interested in working on our benchmarks and in comparing their results with those published at the workshop. This paper provides an overview of the IWSLT 2011 Evaluation Campaign, which includes: descriptions of the supplied data and evaluation specifications of each track, the list of participants specifying their submitted runs, a detailed description of the subjective evaluation carried out, the main findings of each exercise drawn from the results and the system descriptions prepared by the participants, and, finally, several detailed tables reporting all the evaluation results.

pdf bib
The 2011 KIT English ASR system for the IWSLT evaluation
Sebastian Stüker | Kevin Kilgour | Christian Saam | Alex Waibel
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes our English Speech-to-Text (STT) system for the 2011 IWSLT ASR track. The system consists of 2 subsystems with different front-ends—one MVDR based, one MFCC based—which are combined using confusion network combination to provide a base for a second pass speaker adapted MVDR system. We demonstrate that this set-up produces competitive results on the IWSLT 2010 dev and test sets.

pdf bib
Speech recognition for machine translation in Quaero
Lori Lamel | Sandrine Courcinous | Julien Despres | Jean-Luc Gauvain | Yvan Josse | Kevin Kilgour | Florian Kraft | Viet-Bac Le | Hermann Ney | Markus Nußbaum-Thom | Ilya Oparin | Tim Schlippe | Ralf Schlüter | Tanja Schultz | Thiago Fraga da Silva | Sebastian Stüker | Martin Sundermeyer | Bianca Vieru | Ngoc Thang Vu | Alexander Waibel | Cécile Woehrling
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the speech-to-text systems used to provide automatic transcriptions used in the Quaero 2010 evaluation of Machine Translation from speech. Quaero (www.quaero.org) is a large research and industrial innovation program focusing on technologies for automatic analysis and classification of multimedia and multilingual documents. The ASR transcript is the result of a Rover combination of systems from three teams ( KIT, RWTH, LIMSI+VR) for the French and German languages. The casesensitive word error rates (WER) of the combined systems were respectively 20.8% and 18.1% on the 2010 evaluation data, relative WER reductions of 14.6% and 17.4% respectively over the best component system.

pdf bib
The 2011 KIT QUAERO speech-to-text system for Spanish
Kevin Kilgour | Christian Saam | Christian Mohr | Sebastian Stüker | Alex Waibel
Proceedings of the 8th International Workshop on Spoken Language Translation: Papers

This paper describes our current Spanish speech-to-text (STT) system with which we participated in the 2011 Quaero STT evaluation that is being developed within the Quaero program. The system consists of 4 separate subsystems, as well as the standard MFCC and MVDR phoneme based subsystems we included a both a phoneme and grapheme based bottleneck subsystem. We carefully evaluate the performance of each subsystem. After including several new techniques we were able to reduce the WER by over 30% from 20.79% to 14.53%.

2010

pdf bib
Overview of the IWSLT 2010 evaluation campaign
Michael Paul | Marcello Federico | Sebastian Stüker
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper gives an overview of the evaluation campaign results of the 7th International Workshop on Spoken Language Translation (IWSLT 2010)1. This year, we focused on three spoken language tasks: (1) public speeches on a variety of topics (TALK) from English to French, (2) spoken dialog in travel situations (DIALOG) between Chinese and English, and (3) traveling expressions (BTEC) from Arabic, Turkish, and French to English. In total, 28 teams (including 7 firsttime participants) took part in the shared tasks, submitting 60 primary and 112 contrastive runs. Automatic and subjective evaluations of the primary runs were carried out in order to investigate the impact of different communication modalities, spoken language styles and semantic context on automatic speech recognition (ASR) and machine translation (MT) system performances.

2007

pdf bib
The CMU TransTac 2007 eyes-free two-way speech-to-speech translation system
Nguyen Bach | Matthais Eck | Paisarn Charoenpornsawat | Thilo Köhler | Sebastian Stüker | ThuyLinh Nguyen | Roger Hsiao | Alex Waibel | Stephan Vogel | Tanja Schultz | Alan W. Black
Proceedings of the Fourth International Workshop on Spoken Language Translation

The paper describes our portable two-way speech-to-speech translation system using a completely eyes-free/hands-free user interface. This system translates between the language pair English and Iraqi Arabic as well as between English and Farsi, and was built within the framework of the DARPA TransTac program. The Farsi language support was developed within a 90-day period, testing our ability to rapidly support new languages. The paper gives an overview of the system’s components along with the individual component objective measures and a discussion of issues relevant for the overall usage of the system. We found that usability, flexibility, and robustness serve as severe constraints on system architecture and design.
Search
Co-authors