Thai Son Nguyen

Also published as: Thai-Son Nguyen


2021

pdf bib
KIT’s IWSLT 2021 Offline Speech Translation System
Tuan Nam Nguyen | Thai Son Nguyen | Christian Huber | Ngoc-Quan Pham | Thanh-Le Ha | Felix Schneider | Sebastian Stüker
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper describes KIT’submission to the IWSLT 2021 Offline Speech Translation Task. We describe a system in both cascaded condition and end-to-end condition. In the cascaded condition, we investigated different end-to-end architectures for the speech recognition module. For the text segmentation module, we trained a small transformer-based model on high-quality monolingual data. For the translation module, our last year’s neural machine translation model was reused. In the end-to-end condition, we improved our Speech Relative Transformer architecture to reach or even surpass the result of the cascade system.

pdf bib
ELITR Multilingual Live Subtitling: Demo and Strategy
Ondřej Bojar | Dominik Macháček | Sangeet Sagar | Otakar Smrž | Jonáš Kratochvíl | Peter Polák | Ebrahim Ansari | Mohammad Mahmoudi | Rishu Kumar | Dario Franceschini | Chiara Canton | Ivan Simonini | Thai-Son Nguyen | Felix Schneider | Sebastian Stüker | Alex Waibel | Barry Haddow | Rico Sennrich | Philip Williams
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

This paper presents an automatic speech translation system aimed at live subtitling of conference presentations. We describe the overall architecture and key processing components. More importantly, we explain our strategy for building a complex system for end-users from numerous individual components, each of which has been tested only in laboratory conditions. The system is a working prototype that is routinely tested in recognizing English, Czech, and German speech and presenting it translated simultaneously into 42 target languages.

2020

pdf bib
KIT’s IWSLT 2020 SLT Translation System
Ngoc-Quan Pham | Felix Schneider | Tuan-Nam Nguyen | Thanh-Le Ha | Thai Son Nguyen | Maximilian Awiszus | Sebastian Stüker | Alexander Waibel
Proceedings of the 17th International Conference on Spoken Language Translation

This paper describes KIT’s submissions to the IWSLT2020 Speech Translation evaluation campaign. We first participate in the simultaneous translation task, in which our simultaneous models are Transformer based and can be efficiently trained to obtain low latency with minimized compromise in quality. On the offline speech translation task, we applied our new Speech Transformer architecture to end-to-end speech translation. The obtained model can provide translation quality which is competitive to a complicated cascade. The latter still has the upper hand, thanks to the ability to transparently access to the transcription, and resegment the inputs to avoid fragmentation.

pdf bib
ELITR Non-Native Speech Translation at IWSLT 2020
Dominik Macháček | Jonáš Kratochvíl | Sangeet Sagar | Matúš Žilinec | Ondřej Bojar | Thai-Son Nguyen | Felix Schneider | Philip Williams | Yuekun Yao
Proceedings of the 17th International Conference on Spoken Language Translation

This paper is an ELITR system submission for the non-native speech translation task at IWSLT 2020. We describe systems for offline ASR, real-time ASR, and our cascaded approach to offline SLT and real-time SLT. We select our primary candidates from a pool of pre-existing systems, develop a new end-to-end general ASR system, and a hybrid ASR trained on non-native speech. The provided small validation set prevents us from carrying out a complex validation, but we submit all the unselected candidates for contrastive evaluation on the test set.

pdf bib
ELITR: European Live Translator
Ondřej Bojar | Dominik Macháček | Sangeet Sagar | Otakar Smrž | Jonáš Kratochvíl | Ebrahim Ansari | Dario Franceschini | Chiara Canton | Ivan Simonini | Thai-Son Nguyen | Felix Schneider | Sebastian Stücker | Alex Waibel | Barry Haddow | Rico Sennrich | Philip Williams
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

ELITR (European Live Translator) project aims to create a speech translation system for simultaneous subtitling of conferences and online meetings targetting up to 43 languages. The technology is tested by the Supreme Audit Office of the Czech Republic and by alfaview®, a German online conferencing system. Other project goals are to advance document-level and multilingual machine translation, automatic speech recognition, and automatic minuting.

pdf bib
Removing European Language Barriers with Innovative Machine Translation Technology
Dario Franceschini | Chiara Canton | Ivan Simonini | Armin Schweinfurth | Adelheid Glott | Sebastian Stüker | Thai-Son Nguyen | Felix Schneider | Thanh-Le Ha | Alex Waibel | Barry Haddow | Philip Williams | Rico Sennrich | Ondřej Bojar | Sangeet Sagar | Dominik Macháček | Otakar Smrž
Proceedings of the 1st International Workshop on Language Technology Platforms

This paper presents our progress towards deploying a versatile communication platform in the task of highly multilingual live speech translation for conferences and remote meetings live subtitling. The platform has been designed with a focus on very low latency and high flexibility while allowing research prototypes of speech and text processing tools to be easily connected, regardless of where they physically run. We outline our architecture solution and also briefly compare it with the ELG platform. Technical details are provided on the most important components and we summarize the test deployment events we ran so far.

2019

pdf bib
The IWSLT 2019 KIT Speech Translation System
Ngoc-Quan Pham | Thai-Son Nguyen | Thanh-Le Ha | Juan Hussain | Felix Schneider | Jan Niehues | Sebastian Stüker | Alexander Waibel
Proceedings of the 16th International Conference on Spoken Language Translation

This paper describes KIT’s submission to the IWSLT 2019 Speech Translation task on two sub-tasks corresponding to two different datasets. We investigate different end-to-end architectures for the speech recognition module, including our new transformer-based architectures. Overall, our modules in the pipe-line are based on the transformer architecture which has recently achieved great results in various fields. In our systems, using transformer is also advantageous compared to traditional hybrid systems in term of simplicity while still having competent results.

2018

pdf bib
KIT’s IWSLT 2018 SLT Translation System
Matthias Sperber | Ngoc-Quan Pham | Thai-Son Nguyen | Jan Niehues | Markus Müller | Thanh-Le Ha | Sebastian Stüker | Alex Waibel
Proceedings of the 15th International Conference on Spoken Language Translation

This paper describes KIT’s submission to the IWSLT 2018 Translation task. We describe a system participating in the baseline condition and a system participating in the end-to-end condition. The baseline system is a cascade of an ASR system, a system to segment the ASR output and a neural machine translation system. We investigate the combination of different ASR systems. For the segmentation and machine translation components, we focused on transformer-based architectures.

pdf bib
KIT Lecture Translator: Multilingual Speech Translation with One-Shot Learning
Florian Dessloch | Thanh-Le Ha | Markus Müller | Jan Niehues | Thai-Son Nguyen | Ngoc-Quan Pham | Elizabeth Salesky | Matthias Sperber | Sebastian Stüker | Thomas Zenkel | Alexander Waibel
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

In today’s globalized world we have the ability to communicate with people across the world. However, in many situations the language barrier still presents a major issue. For example, many foreign students coming to KIT to study are initially unable to follow a lecture in German. Therefore, we offer an automatic simultaneous interpretation service for students. To fulfill this task, we have developed a low-latency translation system that is adapted to lectures and covers several language pairs. While the switch from traditional Statistical Machine Translation to Neural Machine Translation (NMT) significantly improved performance, to integrate NMT into the speech translation framework required several adjustments. We have addressed the run-time constraints and different types of input. Furthermore, we utilized one-shot learning to easily add new topic-specific terms to the system. Besides better performance, NMT also enabled us increase our covered languages through multilingual NMT. % Combining these techniques, we are able to provide an adapted speech translation system for several European languages.

2017

pdf bib
The 2017 KIT IWSLT Speech-to-Text Systems for English and German
Thai-Son Nguyen | Markus Müller | Matthias Sperber | Thomas Zenkel | Sebastian Stüker | Alex Waibel
Proceedings of the 14th International Conference on Spoken Language Translation

This paper describes our German and English Speech-to-Text (STT) systems for the 2017 IWSLT evaluation campaign. The campaign focuses on the transcription of unsegmented lecture talks. Our setup includes systems using both the Janus and Kaldi frameworks. We combined the outputs using both ROVER [1] and confusion network combination (CNC) [2] to achieve a good overall performance. The individual subsystems are built by using different speaker-adaptive feature combination (e.g., lMEL with i-vector or bottleneck speaker vector), acoustic models (GMM or DNN) and speaker adaptation (MLLR or fMLLR). Decoding is performed in two stages, where the GMM and DNN systems are adapted on the combination of the first stage outputs using MLLR, and fMLLR. The combination setup produces a final hypothesis that has a significantly lower WER than any of the individual sub-systems. For the English lecture task, our best combination system has a WER of 8.3% on the tst2015 development set while our other combinations gained 25.7% WER for German lecture tasks.

2016

pdf bib
Lecture Translator - Speech translation framework for simultaneous lecture translation
Markus Müller | Thai Son Nguyen | Jan Niehues | Eunah Cho | Bastian Krüger | Thanh-Le Ha | Kevin Kilgour | Matthias Sperber | Mohammed Mediani | Sebastian Stüker | Alex Waibel
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

pdf bib
The 2016 KIT IWSLT Speech-to-Text Systems for English and German
Thai-Son Nguyen | Markus Müller | Matthias Sperber | Thomas Zenkel | Kevin Kilgour | Sebastian Stüker | Alex Waibel
Proceedings of the 13th International Conference on Spoken Language Translation

This paper describes our German and English Speech-to-Text (STT) systems for the 2016 IWSLT evaluation campaign. The campaign focuses on the transcription of unsegmented TED talks. Our setup includes systems using both the Janus and Kaldi frameworks. We combined the outputs using both ROVER [1] and confusion network combination (CNC) [2] to archieve a good overall performance. The individual subsystems are built by using different speaker-adaptive feature combination (e.g., lMEL with i-vector or bottleneck speaker vector), acoustic models (GMM or DNN) and speaker adaption (MLLR or fMLLR). Decoding is performed in two stages, where the GMM and DNN systems are adapted on the combination of the first stage outputs using MLLR, and fMLLR. The combination setup produces a final hypothesis that has a significantly lower WER than any of the individual subsystems. For the English TED task, our best combination system has a WER of 7.8% on the development set while our other combinations gained 21.8% and 28.7% WERs for the English and German MSLT tasks.