Jan Niehues


2021

pdf bib
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)
Marcello Federico | Alex Waibel | Marta R. Costa-jussà | Jan Niehues | Sebastian Stuker | Elizabeth Salesky
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

pdf bib
FINDINGS OF THE IWSLT 2021 EVALUATION CAMPAIGN
Antonios Anastasopoulos | Ondřej Bojar | Jacob Bremerman | Roldano Cattoni | Maha Elbayad | Marcello Federico | Xutai Ma | Satoshi Nakamura | Matteo Negri | Jan Niehues | Juan Pino | Elizabeth Salesky | Sebastian Stüker | Katsuhito Sudoh | Marco Turchi | Alexander Waibel | Changhan Wang | Matthew Wiesner
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2021) featured this year four shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Multilingual speech translation, (iv) Low-resource speech translation. A total of 22 teams participated in at least one of the tasks. This paper describes each shared task, data and evaluation metrics, and reports results of the received submissions.

pdf bib
Maastricht University’s Multilingual Speech Translation System for IWSLT 2021
Danni Liu | Jan Niehues
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper describes Maastricht University’s participation in the IWSLT 2021 multilingual speech translation track. The task in this track is to build multilingual speech translation systems in supervised and zero-shot directions. Our primary system is an end-to-end model that performs both speech transcription and translation. We observe that the joint training for the two tasks is complementary especially when the speech translation data is scarce. On the source and target side, we use data augmentation and pseudo-labels respectively to improve the performance of our systems. We also introduce an ensembling technique that consistently improves the quality of transcriptions and translations. The experiments show that the end-to-end system is competitive with its cascaded counterpart especially in zero-shot conditions.

pdf bib
Continuous Learning in Neural Machine Translation using Bilingual Dictionaries
Jan Niehues
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

While recent advances in deep learning led to significant improvements in machine translation, neural machine translation is often still not able to continuously adapt to the environment. For humans, as well as for machine translation, bilingual dictionaries are a promising knowledge source to continuously integrate new knowledge. However, their exploitation poses several challenges: The system needs to be able to perform one-shot learning as well as model the morphology of source and target language. In this work, we proposed an evaluation framework to assess the ability of neural machine translation to continuously learn new phrases. We integrate one-shot learning methods for neural machine translation with different word representations and show that it is important to address both in order to successfully make use of bilingual dictionaries. By addressing both challenges we are able to improve the ability to translate new, rare words and phrases from 30% to up to 70%. The correct lemma is even generated by more than 90%.

pdf bib
Tutorial Proposal: End-to-End Speech Translation
Jan Niehues | Elizabeth Salesky | Marco Turchi | Matteo Negri
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts

Speech translation is the translation of speech in one language typically to text in another, traditionally accomplished through a combination of automatic speech recognition and machine translation. Speech translation has attracted interest for many years, but the recent successful applications of deep learning to both individual tasks have enabled new opportunities through joint modeling, in what we today call ‘end-to-end speech translation.’ In this tutorial we will introduce the techniques used in cutting-edge research on speech translation. Starting from the traditional cascaded approach, we will given an overview on data sources and model architectures to achieve state-of-the art performance with end-to-end speech translation for both high- and low-resource languages. In addition, we will discuss methods to evaluate analyze the proposed solutions, as well as the challenges faced when applying speech translation models for real-world applications.

pdf bib
Improving Zero-Shot Translation by Disentangling Positional Information
Danni Liu | Jan Niehues | James Cross | Francisco Guzmán | Xian Li
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Multilingual neural machine translation has shown the capability of directly translating between language pairs unseen in training, i.e. zero-shot translation. Despite being conceptually attractive, it often suffers from low output quality. The difficulty of generalizing to new translation directions suggests the model representations are highly specific to those language pairs seen in training. We demonstrate that a main factor causing the language-specific representations is the positional correspondence to input tokens. We show that this can be easily alleviated by removing residual connections in an encoder layer. With this modification, we gain up to 18.5 BLEU points on zero-shot translation while retaining quality on supervised directions. The improvements are particularly prominent between related languages, where our proposed model outperforms pivot-based translation. Moreover, our approach allows easy integration of new languages, which substantially expands translation coverage. By thorough inspections of the hidden layer outputs, we show that our approach indeed leads to more language-independent representations.

pdf bib
Unsupervised Machine Translation On Dravidian Languages
Sai Koneru | Danni Liu | Jan Niehues
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

Unsupervised Neural Machine translation (UNMT) is beneficial especially for under-resourced languages such as from the Dravidian family. They learn to translate between the source and target, relying solely on only monolingual corpora. However, UNMT systems fail in scenarios that occur often when dealing with low resource languages. Recent works have achieved state-of-the-art results by adding auxiliary parallel data with similar languages. In this work, we focus on unsupervised translation between English and Kannada by using limited amounts of auxiliary data between English and other Dravidian languages. We show that transliteration is essential in unsupervised translation between Dravidian languages, as they do not share a common writing system. We explore several model architectures that use the auxiliary data in order to maximize knowledge sharing and enable UNMT for dissimilar language pairs. We show from our experiments it is crucial for Kannada and reference languages to be similar. Further, we propose a method to measure language similarity to choose the most beneficial reference languages.

2020

pdf bib
Proceedings of the 17th International Conference on Spoken Language Translation
Marcello Federico | Alex Waibel | Kevin Knight | Satoshi Nakamura | Hermann Ney | Jan Niehues | Sebastian Stüker | Dekai Wu | Joseph Mariani | Francois Yvon
Proceedings of the 17th International Conference on Spoken Language Translation

pdf bib
FINDINGS OF THE IWSLT 2020 EVALUATION CAMPAIGN
Ebrahim Ansari | Amittai Axelrod | Nguyen Bach | Ondřej Bojar | Roldano Cattoni | Fahim Dalvi | Nadir Durrani | Marcello Federico | Christian Federmann | Jiatao Gu | Fei Huang | Kevin Knight | Xutai Ma | Ajay Nagesh | Matteo Negri | Jan Niehues | Juan Pino | Elizabeth Salesky | Xing Shi | Sebastian Stüker | Marco Turchi | Alexander Waibel | Changhan Wang
Proceedings of the 17th International Conference on Spoken Language Translation

The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2020) featured this year six challenge tracks: (i) Simultaneous speech translation, (ii) Video speech translation, (iii) Offline speech translation, (iv) Conversational speech translation, (v) Open domain translation, and (vi) Non-native speech translation. A total of teams participated in at least one of the tracks. This paper introduces each track’s goal, data and evaluation metrics, and reports the results of the received submissions.

pdf bib
Adapting End-to-End Speech Recognition for Readable Subtitles
Danni Liu | Jan Niehues | Gerasimos Spanakis
Proceedings of the 17th International Conference on Spoken Language Translation

Automatic speech recognition (ASR) systems are primarily evaluated on transcription accuracy. However, in some use cases such as subtitling, verbatim transcription would reduce output readability given limited screen size and reading time. Therefore, this work focuses on ASR with output compression, a task challenging for supervised approaches due to the scarcity of training data. We first investigate a cascaded system, where an unsupervised compression model is used to post-edit the transcribed speech. We then compare several methods of end-to-end speech recognition under output length constraints. The experiments show that with limited data far less than needed for training a model from scratch, we can adapt a Transformer-based ASR model to incorporate both transcription and compression capabilities. Furthermore, the best performance in terms of WER and ROUGE scores is achieved by explicitly modeling the length constraints within the end-to-end ASR system.

pdf bib
Machine Translation with Unsupervised Length-Constraints
Jan Niehues
Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

2019

pdf bib
Incremental processing of noisy user utterances in the spoken language understanding task
Stefan Constantin | Jan Niehues | Alex Waibel
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

The state-of-the-art neural network architectures make it possible to create spoken language understanding systems with high quality and fast processing time. One major challenge for real-world applications is the high latency of these systems caused by triggered actions with high executions times. If an action can be separated into subactions, the reaction time of the systems can be improved through incremental processing of the user utterance and starting subactions while the utterance is still being uttered. In this work, we present a model-agnostic method to achieve high quality in processing incrementally produced partial utterances. Based on clean and noisy versions of the ATIS dataset, we show how to create datasets with our method to create low-latency natural language understanding components. We get improvements of up to 47.91 absolute percentage points in the metric F1-score.

pdf bib
Improving Zero-shot Translation with Language-Independent Constraints
Ngoc-Quan Pham | Jan Niehues | Thanh-Le Ha | Alexander Waibel
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)

An important concern in training multilingual neural machine translation (NMT) is to translate between language pairs unseen during training, i.e zero-shot translation. Improving this ability kills two birds with one stone by providing an alternative to pivot translation which also allows us to better understand how the model captures information between languages. In this work, we carried out an investigation on this capability of the multilingual NMT models. First, we intentionally create an encoder architecture which is independent with respect to the source language. Such experiments shed light on the ability of NMT encoders to learn multilingual representations, in general. Based on such proof of concept, we were able to design regularization methods into the standard Transformer model, so that the whole architecture becomes more robust in zero-shot conditions. We investigated the behaviour of such models on the standard IWSLT 2017 multilingual dataset. We achieved an average improvement of 2.23 BLEU points across 12 language pairs compared to the zero-shot performance of a state-of-the-art multilingual system. Additionally, we carry out further experiments in which the effect is confirmed even for language pairs with multiple intermediate pivots.

pdf bib
Modeling Confidence in Sequence-to-Sequence Models
Jan Niehues | Ngoc-Quan Pham
Proceedings of the 12th International Conference on Natural Language Generation

Recently, significant improvements have been achieved in various natural language processing tasks using neural sequence-to-sequence models. While aiming for the best generation quality is important, ultimately it is also necessary to develop models that can assess the quality of their output. In this work, we propose to use the similarity between training and test conditions as a measure for models’ confidence. We investigate methods solely using the similarity as well as methods combining it with the posterior probability. While traditionally only target tokens are annotated with confidence measures, we also investigate methods to annotate source tokens with confidence. By learning an internal alignment model, we can significantly improve confidence projection over using state-of-the-art external alignment tools. We evaluate the proposed methods on downstream confidence estimation for machine translation (MT). We show improvements on segment-level confidence estimation as well as on confidence estimation for source tokens. In addition, we show that the same methods can also be applied to other tasks using sequence-to-sequence models. On the automatic speech recognition (ASR) task, we are able to find 60% of the errors by looking at 20% of the data.

pdf bib
Attention-Passing Models for Robust and Data-Efficient End-to-End Speech Translation
Matthias Sperber | Graham Neubig | Jan Niehues | Alex Waibel
Transactions of the Association for Computational Linguistics, Volume 7

Speech translation has traditionally been approached through cascaded models consisting of a speech recognizer trained on a corpus of transcribed speech, and a machine translation system trained on parallel texts. Several recent works have shown the feasibility of collapsing the cascade into a single, direct model that can be trained in an end-to-end fashion on a corpus of translated speech. However, experiments are inconclusive on whether the cascade or the direct model is stronger, and have only been conducted under the unrealistic assumption that both are trained on equal amounts of data, ignoring other available speech recognition and machine translation corpora. In this paper, we demonstrate that direct speech translation models require more data to perform well than cascaded models, and although they allow including auxiliary data through multi-task training, they are poor at exploiting such data, putting them at a severe disadvantage. As a remedy, we propose the use of end- to-end trainable models with two attention mechanisms, the first establishing source speech to source text alignments, the second modeling source to target text alignment. We show that such models naturally decompose into multi-task–trainable recognition and translation tasks and propose an attention-passing technique that alleviates error propagation issues in a previous formulation of a model with two attention stages. Our proposed model outperforms all examined baselines and is able to exploit auxiliary training data much more effectively than direct attentional models.

2018

pdf bib
KIT Lecture Translator: Multilingual Speech Translation with One-Shot Learning
Florian Dessloch | Thanh-Le Ha | Markus Müller | Jan Niehues | Thai-Son Nguyen | Ngoc-Quan Pham | Elizabeth Salesky | Matthias Sperber | Sebastian Stüker | Thomas Zenkel | Alexander Waibel
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

In today’s globalized world we have the ability to communicate with people across the world. However, in many situations the language barrier still presents a major issue. For example, many foreign students coming to KIT to study are initially unable to follow a lecture in German. Therefore, we offer an automatic simultaneous interpretation service for students. To fulfill this task, we have developed a low-latency translation system that is adapted to lectures and covers several language pairs. While the switch from traditional Statistical Machine Translation to Neural Machine Translation (NMT) significantly improved performance, to integrate NMT into the speech translation framework required several adjustments. We have addressed the run-time constraints and different types of input. Furthermore, we utilized one-shot learning to easily add new topic-specific terms to the system. Besides better performance, NMT also enabled us increase our covered languages through multilingual NMT. % Combining these techniques, we are able to provide an adapted speech translation system for several European languages.

pdf bib
Automated Evaluation of Out-of-Context Errors
Patrick Huber | Jan Niehues | Alex Waibel
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
KIT-Multi: A Translation-Oriented Multilingual Embedding Corpus
Thanh-Le Ha | Jan Niehues | Matthias Sperber | Ngoc Quan Pham | Alexander Waibel
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Robust and Scalable Differentiable Neural Computer for Question Answering
Jörg Franke | Jan Niehues | Alex Waibel
Proceedings of the Workshop on Machine Reading for Question Answering

Deep learning models are often not easily adaptable to new tasks and require task-specific adjustments. The differentiable neural computer (DNC), a memory-augmented neural network, is designed as a general problem solver which can be used in a wide range of tasks. But in reality, it is hard to apply this model to new tasks. We analyze the DNC and identify possible improvements within the application of question answering. This motivates a more robust and scalable DNC (rsDNC). The objective precondition is to keep the general character of this model intact while making its application more reliable and speeding up its required training time. The rsDNC is distinguished by a more robust training, a slim memory unit and a bidirectional architecture. We not only achieve new state-of-the-art performance on the bAbI task, but also minimize the performance variance between different initializations. Furthermore, we demonstrate the simplified applicability of the rsDNC to new tasks with passable results on the CNN RC task without adaptions.

pdf bib
Towards one-shot learning for rare-word translation with external experts
Ngoc-Quan Pham | Jan Niehues | Alexander Waibel
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

Neural machine translation (NMT) has significantly improved the quality of automatic translation models. One of the main challenges in current systems is the translation of rare words. We present a generic approach to address this weakness by having external models annotate the training data as Experts, and control the model-expert interaction with a pointer network and reinforcement learning. Our experiments using phrase-based models to simulate Experts to complement neural machine translation models show that the model can be trained to copy the annotations into the output consistently. We demonstrate the benefit of our proposed framework in outof domain translation scenarios with only lexical resources, improving more than 1.0 BLEU point in both translation directions English-Spanish and German-English.

pdf bib
The Karlsruhe Institute of Technology Systems for the News Translation Task in WMT 2018
Ngoc-Quan Pham | Jan Niehues | Alexander Waibel
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

We present our experiments in the scope of the news translation task in WMT 2018, in directions: English→German. The core of our systems is the encoder-decoder based neural machine translation models using the transformer architecture. We enhanced the model with a deeper architecture. By using techniques to limit the memory consumption, we were able to train models that are 4 times larger on one GPU and improve the performance by 1.2 BLEU points. Furthermore, we performed sentence selection for the newly available ParaCrawl corpus. Thereby, we could improve the effectiveness of the corpus by 0.5 BLEU points.

2017

pdf bib
Analyzing Neural MT Search and Model Performance
Jan Niehues | Eunah Cho | Thanh-Le Ha | Alex Waibel
Proceedings of the First Workshop on Neural Machine Translation

In this paper, we offer an in-depth analysis about the modeling and search performance. We address the question if a more complex search algorithm is necessary. Furthermore, we investigate the question if more complex models which might only be applicable during rescoring are promising. By separating the search space and the modeling using n-best list reranking, we analyze the influence of both parts of an NMT system independently. By comparing differently performing NMT systems, we show that the better translation is already in the search space of the translation systems with less performance. This results indicate that the current search algorithms are sufficient for the NMT systems. Furthermore, we could show that even a relatively small n-best list of 50 hypotheses already contain notably better translations.

pdf bib
Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning
Jan Niehues | Eunah Cho
Proceedings of the Second Conference on Machine Translation

pdf bib
The QT21 Combined Machine Translation System for English to Latvian
Jan-Thorsten Peter | Hermann Ney | Ondřej Bojar | Ngoc-Quan Pham | Jan Niehues | Alex Waibel | Franck Burlot | François Yvon | Mārcis Pinnis | Valters Šics | Jasmijn Bastings | Miguel Rios | Wilker Aziz | Philip Williams | Frédéric Blain | Lucia Specia
Proceedings of the Second Conference on Machine Translation

pdf bib
The Karlsruhe Institute of Technology Systems for the News Translation Task in WMT 2017
Ngoc-Quan Pham | Jan Niehues | Thanh-Le Ha | Eunah Cho | Matthias Sperber | Alexander Waibel
Proceedings of the Second Conference on Machine Translation

pdf bib
Neural Lattice-to-Sequence Models for Uncertain Inputs
Matthias Sperber | Graham Neubig | Jan Niehues | Alex Waibel
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

The input to a neural sequence-to-sequence model is often determined by an up-stream system, e.g. a word segmenter, part of speech tagger, or speech recognizer. These up-stream models are potentially error-prone. Representing inputs through word lattices allows making this uncertainty explicit by capturing alternative sequences and their posterior probabilities in a compact form. In this work, we extend the TreeLSTM (Tai et al., 2015) into a LatticeLSTM that is able to consume word lattices, and can be used as encoder in an attentional encoder-decoder model. We integrate lattice posterior scores into this architecture by extending the TreeLSTM’s child-sum and forget gates and introducing a bias term into the attention mechanism. We experiment with speech translation lattices and report consistent improvements over baselines that translate either the 1-best hypothesis or the lattice without posterior scores.

2016

pdf bib
Using Factored Word Representation in Neural Network Language Models
Jan Niehues | Thanh-Le Ha | Eunah Cho | Alex Waibel
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers

pdf bib
The Karlsruhe Institute of Technology Systems for the News Translation Task in WMT 2016
Thanh-Le Ha | Eunah Cho | Jan Niehues | Mohammed Mediani | Matthias Sperber | Alexandre Allauzen | Alexander Waibel
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
The QT21/HimL Combined Machine Translation System
Jan-Thorsten Peter | Tamer Alkhouli | Hermann Ney | Matthias Huck | Fabienne Braune | Alexander Fraser | Aleš Tamchyna | Ondřej Bojar | Barry Haddow | Rico Sennrich | Frédéric Blain | Lucia Specia | Jan Niehues | Alex Waibel | Alexandre Allauzen | Lauriane Aufrant | Franck Burlot | Elena Knyazeva | Thomas Lavergne | François Yvon | Mārcis Pinnis | Stella Frank
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Lecture Translator - Speech translation framework for simultaneous lecture translation
Markus Müller | Thai Son Nguyen | Jan Niehues | Eunah Cho | Bastian Krüger | Thanh-Le Ha | Kevin Kilgour | Matthias Sperber | Mohammed Mediani | Sebastian Stüker | Alex Waibel
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

pdf bib
Pre-Translation for Neural Machine Translation
Jan Niehues | Eunah Cho | Thanh-Le Ha | Alex Waibel
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Recently, the development of neural machine translation (NMT) has significantly improved the translation quality of automatic machine translation. While most sentences are more accurate and fluent than translations by statistical machine translation (SMT)-based systems, in some cases, the NMT system produces translations that have a completely different meaning. This is especially the case when rare words occur. When using statistical machine translation, it has already been shown that significant gains can be achieved by simplifying the input in a preprocessing step. A commonly used example is the pre-reordering approach. In this work, we used phrase-based machine translation to pre-translate the input into the target language. Then a neural machine translation system generates the final hypothesis using the pre-translation. Thereby, we use either only the output of the phrase-based machine translation (PBMT) system or a combination of the PBMT output and the source sentence. We evaluate the technique on the English to German translation task. Using this approach we are able to outperform the PBMT system as well as the baseline neural MT system by up to 2 BLEU points. We analyzed the influence of the quality of the initial system on the final result.

pdf bib
Lightly Supervised Quality Estimation
Matthias Sperber | Graham Neubig | Jan Niehues | Sebastian Stüker | Alex Waibel
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Evaluating the quality of output from language processing systems such as machine translation or speech recognition is an essential step in ensuring that they are sufficient for practical use. However, depending on the practical requirements, evaluation approaches can differ strongly. Often, reference-based evaluation measures (such as BLEU or WER) are appealing because they are cheap and allow rapid quantitative comparison. On the other hand, practitioners often focus on manual evaluation because they must deal with frequently changing domains and quality standards requested by customers, for which reference-based evaluation is insufficient or not possible due to missing in-domain reference data (Harris et al., 2016). In this paper, we attempt to bridge this gap by proposing a framework for lightly supervised quality estimation. We collect manually annotated scores for a small number of segments in a test corpus or document, and combine them with automatically predicted quality scores for the remaining segments to predict an overall quality estimate. An evaluation shows that our framework estimates quality more reliably than using fully automatic quality estimation approaches, while keeping annotation effort low by not requiring full references to be available for the particular domain.

2015

pdf bib
The Karlsruhe Institute of Technology Translation Systems for the WMT 2015
Eunah Cho | Thanh-Le Ha | Jan Niehues | Teresa Herrmann | Mohammed Mediani | Yuqi Zhang | Alex Waibel
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
The KIT-LIMSI Translation System for WMT 2015
Thanh-Le Ha | Quoc-Khanh Do | Eunah Cho | Jan Niehues | Alexandre Allauzen | François Yvon | Alex Waibel
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
ListNet-based MT Rescoring
Jan Niehues | Quoc Khanh Do | Alexandre Allauzen | Alex Waibel
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
Stripping Adjectives: Integration Techniques for Selective Stemming in SMT Systems
Isabel Slawik | Jan Niehues | Alex Waibel
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Stripping Adjectives: Integration Techniques for Selective Stemming in SMT Systems
Isabel Slawik | Jan Niehues | Alex Waibel
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
The IWSLT 2015 Evaluation Campaign
Mauro Cettolo | Jan Niehues | Sebastian Stüker | Luisa Bentivogli | Roldano Cattoni | Marcello Federico
Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
The KIT translation systems for IWSLT 2015
Thanh-Le Ha | Jan Niehues | Eunah Cho | Mohammed Mediani | Alex Waibel
Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
Source discriminative word lexicon for translation disambiguation
Teresa Herrmann | Jan Niehues | Alex Waibel
Proceedings of the 12th International Workshop on Spoken Language Translation: Papers

pdf bib
Punctuation insertion for real-time spoken language translation
Eunah Cho | Jan Niehues | Kevin Kilgour | Alex Waibel
Proceedings of the 12th International Workshop on Spoken Language Translation: Papers

2014

pdf bib
Tight Integration of Speech Disfluency Removal into SMT
Eunah Cho | Jan Niehues | Alex Waibel
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

pdf bib
Combining techniques from different NN-based language models for machine translation
Jan Niehues | Alexander Allauzen | François Yvon | Alex Waibel
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas: MT Researchers Track

This paper presents two improvements of language models based on Restricted Boltzmann Machine (RBM) for large machine translation tasks. In contrast to other continuous space approach, RBM based models can easily be integrated into the decoder and are able to directly learn a hidden representation of the n-gram. Previous work on RBM-based language models do not use a shared word representation and therefore, they might suffer of a lack of generalization for larger contexts. Moreover, since the training step is very time consuming, they are only used for quite small copora. In this work we add a shared word representation for the RBM-based language model by factorizing the weight matrix. In addition, we propose an efficient and tailored sampling algorithm that allows us to drastically speed up the training process. Experiments are carried out on two German to English translation tasks and the results show that the training time could be reduced by a factor of 10 without any drop in performance. Furthermore, the RBM-based model can also be trained on large size corpora.

pdf bib
The KIT-LIMSI Translation System for WMT 2014
Quoc Khanh Do | Teresa Herrmann | Jan Niehues | Alexander Allauzen | François Yvon | Alex Waibel
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
The Karlsruhe Institute of Technology Translation Systems for the WMT 2014
Teresa Herrmann | Mohammed Mediani | Eunah Cho | Thanh-Le Ha | Jan Niehues | Isabel Slawik | Yuqi Zhang | Alex Waibel
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
LIMSI @ WMT’14 Medical Translation Task
Nicolas Pécheux | Li Gong | Quoc Khanh Do | Benjamin Marie | Yulia Ivanishcheva | Alexander Allauzen | Thomas Lavergne | Jan Niehues | Aurélien Max | François Yvon
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Report on the 11th IWSLT evaluation campaign
Mauro Cettolo | Jan Niehues | Sebastian Stüker | Luisa Bentivogli | Marcello Federico
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign

The paper overviews the 11th evaluation campaign organized by the IWSLT workshop. The 2014 evaluation offered multiple tracks on lecture transcription and translation based on the TED Talks corpus. In particular, this year IWSLT included three automatic speech recognition tracks, on English, German and Italian, five speech translation tracks, from English to French, English to German, German to English, English to Italian, and Italian to English, and five text translation track, also from English to French, English to German, German to English, English to Italian, and Italian to English. In addition to the official tracks, speech and text translation optional tracks were offered, globally involving 12 other languages: Arabic, Spanish, Portuguese (B), Hebrew, Chinese, Polish, Persian, Slovenian, Turkish, Dutch, Romanian, Russian. Overall, 21 teams participated in the evaluation, for a total of 76 primary runs submitted. Participants were also asked to submit runs on the 2013 test set (progress test set), in order to measure the progress of systems with respect to the previous year. All runs were evaluated with objective metrics, and submissions for two of the official text translation tracks were also evaluated with human post-editing.

pdf bib
Combined spoken language translation
Markus Freitag | Joern Wuebker | Stephan Peitz | Hermann Ney | Matthias Huck | Alexandra Birch | Nadir Durrani | Philipp Koehn | Mohammed Mediani | Isabel Slawik | Jan Niehues | Eunach Cho | Alex Waibel | Nicola Bertoldi | Mauro Cettolo | Marcello Federico
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign

EU-BRIDGE is a European research project which is aimed at developing innovative speech translation technology. One of the collaborative efforts within EU-BRIDGE is to produce joint submissions of up to four different partners to the evaluation campaign at the 2014 International Workshop on Spoken Language Translation (IWSLT). We submitted combined translations to the German→English spoken language translation (SLT) track as well as to the German→English, English→German and English→French machine translation (MT) tracks. In this paper, we present the techniques which were applied by the different individual translation systems of RWTH Aachen University, the University of Edinburgh, Karlsruhe Institute of Technology, and Fondazione Bruno Kessler. We then show the combination approach developed at RWTH Aachen University which combined the individual systems. The consensus translations yield empirical gains of up to 2.3 points in BLEU and 1.2 points in TER compared to the best individual system.

pdf bib
The KIT translation systems for IWSLT 2014
Isabel Slawik | Mohammed Mediani | Jan Niehues | Yuqi Zhang | Eunah Cho | Teresa Herrmann | Thanh-Le Ha | Alex Waibel
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper, we present the KIT systems participating in the TED translation tasks of the IWSLT 2014 machine translation evaluation. We submitted phrase-based translation systems for all three official directions, namely English→German, German→English, and English→French, as well as for the optional directions English→Chinese and English→Arabic. For the official directions we built systems both for the machine translation as well as the spoken language translation track. This year we improved our systems’ performance over last year through n-best list rescoring using neural network-based translation and language models and novel preordering rules based on tree information of multiple syntactic levels. Furthermore, we could successfully apply a novel phrase extraction algorithm and transliteration of unknown words for Arabic. We also submitted a contrastive system for German→English built with stemmed German adjectives. For the SLT tracks, we used a monolingual translation system to translate the lowercased ASR hypotheses with all punctuation stripped to truecased, punctuated output as a preprocessing step to our usual translation system.

pdf bib
Machine translation of multi-party meetings: segmentation and disfluency removal strategies
Eunah Cho | Jan Niehues | Alex Waibel
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers

Translating meetings presents a challenge since multi-speaker speech shows a variety of disfluencies. In this paper we investigate the importance of transforming speech into well-written input prior to translating multi-party meetings. We first analyze the characteristics of this data and establish oracle scores. Sentence segmentation and punctuation are performed using a language model, turn information, or a monolingual translation system. Disfluencies are removed by a CRF model trained on in-domain and out-of-domain data. For comparison, we build a combined CRF model for punctuation insertion and disfluency removal. By applying these models, multi-party meetings are transformed into fluent input for machine translation. We evaluate the models with regard to translation performance and are able to achieve an improvement of 2.1 to 4.9 BLEU points depending on the availability of turn information.

pdf bib
Lexical translation model using a deep neural network architecture
Thanh-Le Ha | Jan Niehues | Alex Waibel
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers

In this paper we combine the advantages of a model using global source sentence contexts, the Discriminative Word Lexicon, and neural networks. By using deep neural networks instead of the linear maximum entropy model in the Discriminative Word Lexicon models, we are able to leverage dependencies between different source words due to the non-linearity. Furthermore, the models for different target words can share parameters and therefore data sparsity problems are effectively reduced. By using this approach in a state-of-the-art translation system, we can improve the performance by up to 0.5 BLEU points for three different language pairs on the TED translation task.

pdf bib
Manual Analysis of Structurally Informed Reordering in German-English Machine Translation
Teresa Herrmann | Jan Niehues | Alex Waibel
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Word reordering is a difficult task for translation. Common automatic metrics such as BLEU have problems reflecting improvements in target language word order. However, it is a crucial aspect for humans when deciding on translation quality. This paper presents a detailed analysis of a structure-aware reordering approach applied in a German-to-English phrase-based machine translation system. We compare the translation outputs of two translation systems applying reordering rules based on parts-of-speech and syntax trees on a sentence-by-sentence basis. For each sentence-pair we examine the global translation performance and classify local changes in the translated sentences. This analysis is applied to three data sets representing different genres. While the improvement in BLEU differed substantially between the data sets, the manual evaluation showed that both global translation performance as well as individual types of improvements and degradations exhibit a similar behavior throughout the three data sets. We have observed that for 55-64% of the sentences with different translations, the translation produced using the tree-based reordering was considered to be the better translation. As intended by the investigated reordering model, most improvements are achieved by improving the position of the verb or being able to translate a verb that could not be translated before.

2013

pdf bib
Combining Word Reordering Methods on different Linguistic Abstraction Levels for Statistical Machine Translation
Teresa Herrmann | Jan Niehues | Alex Waibel
Proceedings of the Seventh Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
The Karlsruhe Institute of Technology Translation Systems for the WMT 2013
Eunah Cho | Thanh-Le Ha | Mohammed Mediani | Jan Niehues | Teresa Herrmann | Isabel Slawik | Alex Waibel
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
Joint WMT 2013 Submission of the QUAERO Project
Stephan Peitz | Saab Mansour | Matthias Huck | Markus Freitag | Hermann Ney | Eunah Cho | Teresa Herrmann | Mohammed Mediani | Jan Niehues | Alex Waibel | Alexander Allauzen | Quoc Khanh Do | Bianka Buschbeck | Tonio Wandmacher
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
An MT Error-Driven Discriminative Word Lexicon using Sentence Structure Features
Jan Niehues | Alex Waibel
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
Letter N-Gram-based Input Encoding for Continuous Space Language Models
Henning Sperr | Jan Niehues | Alex Waibel
Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality

pdf bib
Report on the 10th IWSLT evaluation campaign
Mauro Cettolo | Jan Niehues | Sebastian Stüker | Luisa Bentivogli | Marcello Federico
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign

The paper overviews the tenth evaluation campaign organized by the IWSLT workshop. The 2013 evaluation offered multiple tracks on lecture transcription and translation based on the TED Talks corpus. In particular, this year IWSLT included two automatic speech recognition tracks, on English and German, three speech translation tracks, from English to French, English to German, and German to English, and three text translation track, also from English to French, English to German, and German to English. In addition to the official tracks, speech and text translation optional tracks were offered involving 12 other languages: Arabic, Spanish, Portuguese (B), Italian, Chinese, Polish, Persian, Slovenian, Turkish, Dutch, Romanian, Russian. Overall, 18 teams participated in the evaluation for a total of 217 primary runs submitted. All runs were evaluated with objective metrics on a current test set and two progress test sets, in order to compare the progresses against systems of the previous years. In addition, submissions of one of the official machine translation tracks were also evaluated with human post-editing.

pdf bib
EU-BRIDGE MT: text translation of talks in the EU-BRIDGE project
Markus Freitag | Stephan Peitz | Joern Wuebker | Hermann Ney | Nadir Durrani | Matthias Huck | Philipp Koehn | Thanh-Le Ha | Jan Niehues | Mohammed Mediani | Teresa Herrmann | Alex Waibel | Nicola Bertoldi | Mauro Cettolo | Marcello Federico
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign

EU-BRIDGE1 is a European research project which is aimed at developing innovative speech translation technology. This paper describes one of the collaborative efforts within EUBRIDGE to further advance the state of the art in machine translation between two European language pairs, English→French and German→English. Four research institutions involved in the EU-BRIDGE project combined their individual machine translation systems and participated with a joint setup in the machine translation track of the evaluation campaign at the 2013 International Workshop on Spoken Language Translation (IWSLT). We present the methods and techniques to achieve high translation quality for text translation of talks which are applied at RWTH Aachen University, the University of Edinburgh, Karlsruhe Institute of Technology, and Fondazione Bruno Kessler. We then show how we have been able to considerably boost translation performance (as measured in terms of the metrics BLEU and TER) by means of system combination. The joint setups yield empirical gains of up to 1.4 points in BLEU and 2.8 points in TER on the IWSLT test sets compared to the best single systems.

pdf bib
The KIT translation systems for IWSLT 2013
Than-Le Ha | Teresa Herrmann | Jan Niehues | Mohammed Mediani | Eunah Cho | Yuqi Zhang | Isabel Slawik | Alex Waibel
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper, we present the KIT systems participating in all three official directions, namely English→German, German→English, and English→French, in translation tasks of the IWSLT 2013 machine translation evaluation. Additionally, we present the results for our submissions to the optional directions English→Chinese and English→Arabic. We used phrase-based translation systems to generate the translations. This year, we focused on adapting the systems towards ASR input. Furthermore, we investigated different reordering models as well as an extended discriminative word lexicon. Finally, we added a data selection approach for domain adaptation.

pdf bib
Analyzing the potential of source sentence reordering in statistical machine translation
Teresa Herrmann | Jochen Weiner | Jan Niehues | Alex Waibel
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers

We analyze the performance of source sentence reordering, a common reordering approach, using oracle experiments on German-English and English-German translation. First, we show that the potential of this approach is very promising. Compared to a monotone translation, the optimally reordered source sentence leads to improvements of up to 4.6 and 6.2 BLEU points, depending on the language. Furthermore, we perform a detailed evaluation of the different aspects of the approach. We analyze the impact of the restriction of the search space by reordering lattices and we can show that using more complex rule types for reordering results in better approximation of the optimally reordered source. However, a gap of about 3 to 3.8 BLEU points remains, presenting a promising perspective for research on extending the search space through better reordering rules. When evaluating the ranking of different reordering variants, the results reveal that the search for the best path in the lattice performs very well for German-English translation. For English-German translation there is potential for an improvement of up to 1.4 BLEU points through a better ranking of the different reordering possibilities in the reordering lattice.

2012

pdf bib
The KIT translation systems for IWSLT 2012
Mohammed Mediani | Yuqi Zhang | Thanh-Le Ha | Jan Niehues | Eunach Cho | Teresa Herrmann | Rainer Kärgel | Alexander Waibel
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper, we present the KIT systems participating in the English-French TED Translation tasks in the framework of the IWSLT 2012 machine translation evaluation. We also present several additional experiments on the English-German, English-Chinese and English-Arabic translation pairs. Our system is a phrase-based statistical machine translation system, extended with many additional models which were proven to enhance the translation quality. For instance, it uses the part-of-speech (POS)-based reordering, translation and language model adaptation, bilingual language model, word-cluster language model, discriminative word lexica (DWL), and continuous space language model. In addition to this, the system incorporates special steps in the preprocessing and in the post-processing step. In the preprocessing the noisy corpora are filtered by removing the noisy sentence pairs, whereas in the postprocessing the agreement between a noun and its surrounding words in the French translation is corrected based on POS tags with morphological information. Our system deals with speech transcription input by removing case information and punctuation except periods from the text translation model.

pdf bib
Continuous space language models using restricted Boltzmann machines
Jan Niehues | Alex Waibel
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers

We present a novel approach for continuous space language models in statistical machine translation by using Restricted Boltzmann Machines (RBMs). The probability of an n-gram is calculated by the free energy of the RBM instead of a feedforward neural net. Therefore, the calculation is much faster and can be integrated into the translation process instead of using the language model only in a re-ranking step. Furthermore, it is straightforward to introduce additional word factors into the language model. We observed a faster convergence in training if we include automatically generated word classes as an additional word factor. We evaluated the RBM-based language model on the German to English and English to French translation task of TED lectures. Instead of replacing the conventional n-gram-based language model, we trained the RBM-based language model on the more important but smaller in-domain data and combined them in a log-linear way. With this approach we could show improvements of about half a BLEU point on the translation task.

pdf bib
Segmentation and punctuation prediction in speech language translation using a monolingual translation system
Eunah Cho | Jan Niehues | Alex Waibel
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers

In spoken language translation (SLT), finding proper segmentation and reconstructing punctuation marks are not only significant but also challenging tasks. In this paper we present our recent work on speech translation quality analysis for German-English by improving sentence segmentation and punctuation. From oracle experiments, we show an upper bound of translation quality if we had human-generated segmentation and punctuation on the output stream of speech recognition systems. In our oracle experiments we gain 1.78 BLEU points of improvements on the lecture test set. We build a monolingual translation system from German to German implementing segmentation and punctuation prediction as a machine translation task. Using the monolingual translation system we get an improvement of 1.53 BLEU points on the lecture test set, which is a comparable performance against the upper bound drawn by the oracle experiments.

pdf bib
Joint WMT 2012 Submission of the QUAERO Project
Markus Freitag | Stephan Peitz | Matthias Huck | Hermann Ney | Jan Niehues | Teresa Herrmann | Alex Waibel | Hai-son Le | Thomas Lavergne | Alexandre Allauzen | Bianka Buschbeck | Josep Maria Crego | Jean Senellart
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
The Karlsruhe Institute of Technology Translation Systems for the WMT 2012
Jan Niehues | Yuqi Zhang | Mohammed Mediani | Teresa Herrmann | Eunah Cho | Alex Waibel
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
The IWSLT 2011 Evaluation Campaign on Automatic Talk Translation
Marcello Federico | Sebastian Stüker | Luisa Bentivogli | Michael Paul | Mauro Cettolo | Teresa Herrmann | Jan Niehues | Giovanni Moretti
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We report here on the eighth evaluation campaign organized in 2011 by the IWSLT workshop series. That IWSLT 2011 evaluation focused on the automatic translation of public talks and included tracks for speech recognition, speech translation, text translation, and system combination. Unlike in previous years, all data supplied for the evaluation has been publicly released on the workshop website, and is at the disposal of researchers interested in working on our benchmarks and in comparing their results with those published at the workshop. This paper provides an overview of the IWSLT 2011 evaluation campaign, and describes the data supplied, the evaluation infrastructure made available to participants, and the subjective evaluation carried out.

pdf bib
Detailed Analysis of Different Strategies for Phrase Table Adaptation in SMT
Jan Niehues | Alex Waibel
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers

This paper gives a detailed analysis of different approaches to adapt a statistical machine translation system towards a target domain using small amounts of parallel in-domain data. Therefore, we investigate the differences between the approaches addressing adaptation on the two main steps of building a translation model: The candidate selection and the phrase scoring. For the latter step we characterized the differences by four key aspects. We performed experiments on two different tasks of speech translation and analyzed the influence of the different aspects on the overall translation quality. On both tasks we could show significant improvements by using the presented adaptation techniques.

2011

pdf bib
Wider Context by Using Bilingual Language Models in Machine Translation
Jan Niehues | Teresa Herrmann | Stephan Vogel | Alex Waibel
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Joint WMT Submission of the QUAERO Project
Markus Freitag | Gregor Leusch | Joern Wuebker | Stephan Peitz | Hermann Ney | Teresa Herrmann | Jan Niehues | Alex Waibel | Alexandre Allauzen | Gilles Adda | Josep Maria Crego | Bianka Buschbeck | Tonio Wandmacher | Jean Senellart
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
The Karlsruhe Institute of Technology Translation Systems for the WMT 2011
Teresa Herrmann | Mohammed Mediani | Jan Niehues | Alex Waibel
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
The KIT English-French translation systems for IWSLT 2011
Mohammed Mediani | Eunach Cho | Jan Niehues | Teresa Herrmann | Alex Waibel
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper presents the KIT system participating in the English→French TALK Translation tasks in the framework of the IWSLT 2011 machine translation evaluation. Our system is a phrase-based translation system using POS-based reordering extended with many additional features. First of all, a special preprocessing is devoted to the Giga corpus in order to minimize the effect of the great amount of noise it contains. In addition, the system gives more importance to the in-domain data by adapting the translation and the language models as well as by using a wordcluster language model. Furthermore, the system is extended by a bilingual language model and a discriminative word lexicon. The automatic speech transcription input usually has no or wrong punctuation marks, therefore these marks were especially removed from the source training data for the SLT system training.

pdf bib
Advances on spoken language translation in the Quaero program
Karim Boudahmane | Bianka Buschbeck | Eunah Cho | Josep Maria Crego | Markus Freitag | Thomas Lavergne | Hermann Ney | Jan Niehues | Stephan Peitz | Jean Senellart | Artem Sokolov | Alex Waibel | Tonio Wandmacher | Joern Wuebker | François Yvon
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

The Quaero program is an international project promoting research and industrial innovation on technologies for automatic analysis and classification of multimedia and multilingual documents. Within the program framework, research organizations and industrial partners collaborate to develop prototypes of innovating applications and services for access and usage of multimedia data. One of the topics addressed is the translation of spoken language. Each year, a project-internal evaluation is conducted by DGA to monitor the technological advances. This work describes the design and results of the 2011 evaluation campaign. The participating partners were RWTH, KIT, LIMSI and SYSTRAN. Their approaches are compared on both ASR output and reference transcripts of speech data for the translation between French and German. The results show that the developed techniques further the state of the art and improve translation quality.

pdf bib
Using Wikipedia to translate domain-specific terms in SMT
Jan Niehues | Alex Waibel
Proceedings of the 8th International Workshop on Spoken Language Translation: Papers

When building a university lecture translation system, one important step is to adapt it to the target domain. One problem in this adaptation task is to acquire translations for domain specific terms. In this approach we tried to get these translations from Wikipedia, which provides articles on very specific topics in many different languages. To extract translations for the domain specific terms, we used the interlanguage links of Wikipedia . We analyzed different methods to integrate this corpus into our system and explored methods to disambiguate between different translations by using the text of the articles. In addition, we developed methods to handle different morphological forms of the specific terms in morphologically rich input languages like German. The results show that the number of out-of-vocabulary (OOV) words could be reduced by 50% on computer science lectures and the translation quality could be improved by more than 1 BLEU point.

2010

pdf bib
Domain Adaptation in Statistical Machine Translation using Factored Translation Models
Jan Niehues | Alex Waibel
Proceedings of the 14th Annual conference of the European Association for Machine Translation

pdf bib
The Karlsruhe Institute for Technology Translation System for the ACL-WMT 2010
Jan Niehues | Teresa Herrmann | Mohammed Mediani | Alex Waibel
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
The KIT translation system for IWSLT 2010
Jan Niehues | Mohammed Mediani | Teresa Herrmann | Michael Heck | Christian Herff | Alex Waibel
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign

2009

pdf bib
The Universität Karlsruhe Translation System for the EACL-WMT 2009
Jan Niehues | Teresa Herrmann | Muntsin Kolss | Alex Waibel
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
A POS-Based Model for Long-Range Reorderings in SMT
Jan Niehues | Muntsin Kolss
Proceedings of the Fourth Workshop on Statistical Machine Translation

2008

pdf bib
Discriminative Word Alignment via Alignment Matrix Modeling
Jan Niehues | Stephan Vogel
Proceedings of the Third Workshop on Statistical Machine Translation

pdf bib
Simultaneous German-English lecture translation.
Muntsin Kolss | Matthias Wölfel | Florian Kraft | Jan Niehues | Matthias Paulik | Alex Waibel
Proceedings of the 5th International Workshop on Spoken Language Translation: Papers

In an increasingly globalized world, situations in which people of different native tongues have to communicate with each other become more and more frequent. In many such situations, human interpreters are prohibitively expensive or simply not available. Automatic spoken language translation (SLT), as a cost-effective solution to this dilemma, has received increased attention in recent years. For a broad number of applications, including live SLT of lectures and oral presentations, these automatic systems should ideally operate in real time and with low latency. Large and highly specialized vocabularies as well as strong variations in speaking style – ranging from read speech to free presentations suffering from spontaneous events – make simultaneous SLT of lectures a challenging task. This paper presents our progress in building a simultaneous German-English lecture translation system. We emphasize some of the challenges which are particular to this language pair and propose solutions to tackle some of the problems encountered.

2007

pdf bib
The ISL Phrase-Based MT System for the 2007 ACL Workshop on Statistical Machine Translation
Matthias Paulik | Kay Rottmann | Jan Niehues | Silja Hildebrand | Stephan Vogel
Proceedings of the Second Workshop on Statistical Machine Translation

Search
Co-authors