Ngoc-Quan Pham

Also published as: Ngoc Quan Pham


pdf bib
KIT’s IWSLT 2021 Offline Speech Translation System
Tuan Nam Nguyen | Thai Son Nguyen | Christian Huber | Ngoc-Quan Pham | Thanh-Le Ha | Felix Schneider | Sebastian Stüker
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper describes KIT’submission to the IWSLT 2021 Offline Speech Translation Task. We describe a system in both cascaded condition and end-to-end condition. In the cascaded condition, we investigated different end-to-end architectures for the speech recognition module. For the text segmentation module, we trained a small transformer-based model on high-quality monolingual data. For the translation module, our last year’s neural machine translation model was reused. In the end-to-end condition, we improved our Speech Relative Transformer architecture to reach or even surpass the result of the cascade system.

pdf bib
Multilingual Speech Translation KIT @ IWSLT2021
Ngoc-Quan Pham | Tuan Nam Nguyen | Thanh-Le Ha | Sebastian Stüker | Alexander Waibel | Dan He
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper contains the description for the submission of Karlsruhe Institute of Technology (KIT) for the multilingual TEDx translation task in the IWSLT 2021 evaluation campaign. Our main approach is to develop both cascade and end-to-end systems and eventually combine them together to achieve the best possible results for this extremely low-resource setting. The report also confirms certain consistent architectural improvement added to the Transformer architecture, for all tasks: translation, transcription and speech translation.


pdf bib
KIT’s IWSLT 2020 SLT Translation System
Ngoc-Quan Pham | Felix Schneider | Tuan-Nam Nguyen | Thanh-Le Ha | Thai Son Nguyen | Maximilian Awiszus | Sebastian Stüker | Alexander Waibel
Proceedings of the 17th International Conference on Spoken Language Translation

This paper describes KIT’s submissions to the IWSLT2020 Speech Translation evaluation campaign. We first participate in the simultaneous translation task, in which our simultaneous models are Transformer based and can be efficiently trained to obtain low latency with minimized compromise in quality. On the offline speech translation task, we applied our new Speech Transformer architecture to end-to-end speech translation. The obtained model can provide translation quality which is competitive to a complicated cascade. The latter still has the upper hand, thanks to the ability to transparently access to the transcription, and resegment the inputs to avoid fragmentation.


pdf bib
Self-Attentional Models for Lattice Inputs
Matthias Sperber | Graham Neubig | Ngoc-Quan Pham | Alex Waibel
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Lattices are an efficient and effective method to encode ambiguity of upstream systems in natural language processing tasks, for example to compactly capture multiple speech recognition hypotheses, or to represent multiple linguistic analyses. Previous work has extended recurrent neural networks to model lattice inputs and achieved improvements in various tasks, but these models suffer from very slow computation speeds. This paper extends the recently proposed paradigm of self-attention to handle lattice inputs. Self-attention is a sequence modeling technique that relates inputs to one another by computing pairwise similarities and has gained popularity for both its strong results and its computational efficiency. To extend such models to handle lattices, we introduce probabilistic reachability masks that incorporate lattice structure into the model and support lattice scores if available. We also propose a method for adapting positional embeddings to lattice structures. We apply the proposed model to a speech translation task and find that it outperforms all examined baselines while being much faster to compute than previous neural lattice models during both training and inference.

pdf bib
Improving Zero-shot Translation with Language-Independent Constraints
Ngoc-Quan Pham | Jan Niehues | Thanh-Le Ha | Alexander Waibel
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)

An important concern in training multilingual neural machine translation (NMT) is to translate between language pairs unseen during training, i.e zero-shot translation. Improving this ability kills two birds with one stone by providing an alternative to pivot translation which also allows us to better understand how the model captures information between languages. In this work, we carried out an investigation on this capability of the multilingual NMT models. First, we intentionally create an encoder architecture which is independent with respect to the source language. Such experiments shed light on the ability of NMT encoders to learn multilingual representations, in general. Based on such proof of concept, we were able to design regularization methods into the standard Transformer model, so that the whole architecture becomes more robust in zero-shot conditions. We investigated the behaviour of such models on the standard IWSLT 2017 multilingual dataset. We achieved an average improvement of 2.23 BLEU points across 12 language pairs compared to the zero-shot performance of a state-of-the-art multilingual system. Additionally, we carry out further experiments in which the effect is confirmed even for language pairs with multiple intermediate pivots.

pdf bib
Modeling Confidence in Sequence-to-Sequence Models
Jan Niehues | Ngoc-Quan Pham
Proceedings of the 12th International Conference on Natural Language Generation

Recently, significant improvements have been achieved in various natural language processing tasks using neural sequence-to-sequence models. While aiming for the best generation quality is important, ultimately it is also necessary to develop models that can assess the quality of their output. In this work, we propose to use the similarity between training and test conditions as a measure for models’ confidence. We investigate methods solely using the similarity as well as methods combining it with the posterior probability. While traditionally only target tokens are annotated with confidence measures, we also investigate methods to annotate source tokens with confidence. By learning an internal alignment model, we can significantly improve confidence projection over using state-of-the-art external alignment tools. We evaluate the proposed methods on downstream confidence estimation for machine translation (MT). We show improvements on segment-level confidence estimation as well as on confidence estimation for source tokens. In addition, we show that the same methods can also be applied to other tasks using sequence-to-sequence models. On the automatic speech recognition (ASR) task, we are able to find 60% of the errors by looking at 20% of the data.


pdf bib
KIT Lecture Translator: Multilingual Speech Translation with One-Shot Learning
Florian Dessloch | Thanh-Le Ha | Markus Müller | Jan Niehues | Thai-Son Nguyen | Ngoc-Quan Pham | Elizabeth Salesky | Matthias Sperber | Sebastian Stüker | Thomas Zenkel | Alexander Waibel
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

In today’s globalized world we have the ability to communicate with people across the world. However, in many situations the language barrier still presents a major issue. For example, many foreign students coming to KIT to study are initially unable to follow a lecture in German. Therefore, we offer an automatic simultaneous interpretation service for students. To fulfill this task, we have developed a low-latency translation system that is adapted to lectures and covers several language pairs. While the switch from traditional Statistical Machine Translation to Neural Machine Translation (NMT) significantly improved performance, to integrate NMT into the speech translation framework required several adjustments. We have addressed the run-time constraints and different types of input. Furthermore, we utilized one-shot learning to easily add new topic-specific terms to the system. Besides better performance, NMT also enabled us increase our covered languages through multilingual NMT. % Combining these techniques, we are able to provide an adapted speech translation system for several European languages.

pdf bib
Towards one-shot learning for rare-word translation with external experts
Ngoc-Quan Pham | Jan Niehues | Alexander Waibel
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

Neural machine translation (NMT) has significantly improved the quality of automatic translation models. One of the main challenges in current systems is the translation of rare words. We present a generic approach to address this weakness by having external models annotate the training data as Experts, and control the model-expert interaction with a pointer network and reinforcement learning. Our experiments using phrase-based models to simulate Experts to complement neural machine translation models show that the model can be trained to copy the annotations into the output consistently. We demonstrate the benefit of our proposed framework in outof domain translation scenarios with only lexical resources, improving more than 1.0 BLEU point in both translation directions English-Spanish and German-English.

pdf bib
The Karlsruhe Institute of Technology Systems for the News Translation Task in WMT 2018
Ngoc-Quan Pham | Jan Niehues | Alexander Waibel
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

We present our experiments in the scope of the news translation task in WMT 2018, in directions: English→German. The core of our systems is the encoder-decoder based neural machine translation models using the transformer architecture. We enhanced the model with a deeper architecture. By using techniques to limit the memory consumption, we were able to train models that are 4 times larger on one GPU and improve the performance by 1.2 BLEU points. Furthermore, we performed sentence selection for the newly available ParaCrawl corpus. Thereby, we could improve the effectiveness of the corpus by 0.5 BLEU points.

pdf bib
KIT-Multi: A Translation-Oriented Multilingual Embedding Corpus
Thanh-Le Ha | Jan Niehues | Matthias Sperber | Ngoc Quan Pham | Alexander Waibel
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)


pdf bib
The QT21 Combined Machine Translation System for English to Latvian
Jan-Thorsten Peter | Hermann Ney | Ondřej Bojar | Ngoc-Quan Pham | Jan Niehues | Alex Waibel | Franck Burlot | François Yvon | Mārcis Pinnis | Valters Šics | Jasmijn Bastings | Miguel Rios | Wilker Aziz | Philip Williams | Frédéric Blain | Lucia Specia
Proceedings of the Second Conference on Machine Translation

pdf bib
The Karlsruhe Institute of Technology Systems for the News Translation Task in WMT 2017
Ngoc-Quan Pham | Jan Niehues | Thanh-Le Ha | Eunah Cho | Matthias Sperber | Alexander Waibel
Proceedings of the Second Conference on Machine Translation


pdf bib
The LAMBADA dataset: Word prediction requiring a broad discourse context
Denis Paperno | Germán Kruszewski | Angeliki Lazaridou | Ngoc Quan Pham | Raffaella Bernardi | Sandro Pezzelle | Marco Baroni | Gemma Boleda | Raquel Fernández
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Convolutional Neural Network Language Models
Ngoc-Quan Pham | German Kruszewski | Gemma Boleda
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing


pdf bib
Predicting Pronouns across Languages with Continuous Word Spaces
Ngoc-Quan Pham | Lonneke van der Plas
Proceedings of the Second Workshop on Discourse in Machine Translation


pdf bib
The speech recognition and machine translation system of IOIT for IWSLT 2013
Ngoc-Quan Pham | Hai-Son Le | Tat-Thang Vu | Chi-Mai Luong
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the Automatic Speech Recognition (ASR) and Machine Translation (MT) systems developed by IOIT for the evaluation campaign of IWSLT2013. For the ASR task, using Kaldi toolkit, we developed the system based on weighted finite state transducer. The system is constructed by applying several techniques, notably, subspace Gaussian mixture models, speaker adaptation, discriminative training, system combination and SOUL, a neural network language model. The techniques used for automatic segmentation are also clarified. Besides, we compared different types of SOUL models in order to study the impact of words of previous sentences in predicting words in language modeling. For the MT task, the baseline system was built based on the open source toolkit N-code, then being augmented by using SOUL on top, i.e., in N-best rescoring phase.