2023
pdf
bib
abs
Speech Translation with Style: AppTek’s Submissions to the IWSLT Subtitling and Formality Tracks in 2023
Parnia Bahar
|
Patrick Wilken
|
Javier Iranzo-Sánchez
|
Mattia Di Gangi
|
Evgeny Matusov
|
Zoltán Tüske
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)
AppTek participated in the subtitling and formality tracks of the IWSLT 2023 evaluation. This paper describes the details of our subtitling pipeline - speech segmentation, speech recognition, punctuation prediction and inverse text normalization, text machine translation and direct speech-to-text translation, intelligent line segmentation - and how we make use of the provided subtitling-specific data in training and fine-tuning. The evaluation results show that our final submissions are competitive, in particular outperforming the submissions by other participants by 5% absolute as measured by the SubER subtitle quality metric. For the formality track, we participate with our En-Ru and En-Pt production models, which support formality control via prefix tokens. Except for informal Portuguese, we achieve near perfect formality level accuracy while at the same time offering high general translation quality.
2016
pdf
bib
abs
The RWTH Aachen LVCSR system for IWSLT-2016 German Skype conversation recognition task
Wilfried Michel
|
Zoltán Tüske
|
M. Ali Basha Shaik
|
Ralf Schlüter
|
Hermann Ney
Proceedings of the 13th International Conference on Spoken Language Translation
In this paper the RWTH large vocabulary continuous speech recognition (LVCSR) systems developed for the IWSLT-2016 evaluation campaign are described. This evaluation campaign focuses on transcribing spontaneous speech from Skype recordings. State-of-the-art bidirectional long short-term memory (LSTM) and deep, multilingually boosted feed-forward neural network (FFNN) acoustic models are trained an narrow and broadband features. An open vocabulary approach using subword units is also considered. LSTM and count-based full word and hybrid backoff language modeling methods are used to model the morphological richness of the German language. All these approaches are combined using confusion network combination (CNC) to yield a competitive WER.
2013
pdf
bib
abs
The RWTH Aachen German and English LVCSR systems for IWSLT-2013
M. Ali Basha Shaik
|
Zoltan Tüske
|
Simon Wiesler
|
Markus Nußbaum-Thom
|
Stephan Peitz
|
Ralf Schlüter
|
Hermann Ney
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign
In this paper, German and English large vocabulary continuous speech recognition (LVCSR) systems developed by the RWTH Aachen University for the IWSLT-2013 evaluation campaign are presented. Good improvements are obtained with state-of-the-art monolingual and multilingual bottleneck features. In addition, an open vocabulary approach using morphemic sub-lexical units is investigated along with the language model adaptation for the German LVCSR. For both the languages, competitive WERs are achieved using system combination.