Jan-Thorsten Peter

2025

We investigate the tradeoff between adequacy and fluency in machine translation. We show the severity of this tradeoff at the evaluation level and analyze where popular metrics fall within it. Essentially, current metrics generally lean toward adequacy, meaning that their scores correlate more strongly with the adequacy of translations than with fluency. More importantly, we find that this tradeoff also persists at the meta-evaluation level, and that the standard WMT meta-evaluation favors adequacy-oriented metrics over fluency-oriented ones. We show that this bias is partially attributed to the composition of the systems included in the meta-evaluation datasets. To control this bias, we propose a method that synthesizes translation systems in meta-evaluation. Our findings highlight the importance of understanding this tradeoff in meta-evaluation and its impact on metric rankings.

Large Language Models have shown impressive multilingual capabilities, where translation is one among many tasks. Google Translate’s submission to the 2025 WMT evaluation tries to research how these models behave when pushing their translation performance to the limit. Starting with the strong Gemma 3 model, we carry out supervised fine tuning on high quality, synthetically generated parallel data. Afterwards we perform an additional reinforcement learning step, with reward models based on translation metrics to push the translation capabilities even further. Controlling the combination of reward models, including reference-based and quality estimation metrics, we found that the behaviour of the model could be tailored towards a more literal or more creative translation style. Our two submissions correspond to those two models. We chose the more creative system as our primary submission, targetting a human preference for better sounding, more naturally flowing text, although at the risk of losing on the accuracy of the translation. It is an open question to find the sweet spot between these two dimensions, which certainly will depend on the specific domain to handle and user preferences.

2023

Quality Estimation (QE), the evaluation of machine translation output without the need of explicit references, has seen big improvements in the last years with the use of neural metrics. In this paper we analyze the viability of using QE metrics for filtering out bad quality sentence pairs in the training data of neural machine translation systems (NMT). While most corpus filtering methods are focused on detecting noisy examples in collections of texts, usually huge amounts of web crawled data, QE models are trained to discriminate more fine-grained quality differences. We show that by selecting the highest quality sentence pairs in the training data, we can improve translation quality while reducing the training size by half. We also provide a detailed analysis of the filtering results, which highlights the differences between both approaches.

2018

This work describes AppTek’s speech translation pipeline that includes strong state-of-the-art automatic speech recognition (ASR) and neural machine translation (NMT) components. We show how these components can be tightly coupled by encoding ASR confusion networks, as well as ASR-like noise adaptation, vocabulary normalization, and implicit punctuation prediction during translation. In another experimental setup, we propose a direct speech translation approach that can be scaled to translation tasks with large amounts of text-only parallel training data but a limited number of hours of recorded and human-translated speech.

pdf bib abs
Sisyphus, a Workflow Manager Designed for Machine Translation and Automatic Speech Recognition
Jan-Thorsten Peter | Eugen Beck | Hermann Ney
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Training and testing many possible parameters or model architectures of state-of-the-art machine translation or automatic speech recognition system is a cumbersome task. They usually require a long pipeline of commands reaching from pre-processing the training data to post-processing and evaluating the output.

2017

2016

pdf bib abs
Guided Alignment Training for Topic-Aware Neural Machine Translation
Wenhu Chen | Evgeny Matusov | Shahram Khadivi | Jan-Thorsten Peter
Conferences of the Association for Machine Translation in the Americas: MT Researchers' Track

In this paper, we propose an effective way for biasing the attention mechanism of a sequence-to-sequence neural machine translation (NMT) model towards the well-studied statistical word alignment models. We show that our novel guided alignment training approach improves translation quality on real-life e-commerce texts consisting of product titles and descriptions, overcoming the problems posed by many unknown words and a large type/token ratio. We also show that meta-data associated with input texts such as topic or category information can significantly improve translation quality when used as an additional signal to the decoder part of the network. With both novel features, the BLEU score of the NMT system on a product title set improves from 18.6 to 21.3%. Even larger MT quality gains are obtained through domain adaptation of a general domain NMT system to e-commerce data. The developed NMT system also performs well on the IWSLT speech translation task, where an ensemble of four variant systems outperforms the phrase-based baseline by 2.1% BLEU absolute.

pdf bib abs
The RWTH Aachen Machine Translation System for IWSLT 2016
Jan-Thorsten Peter | Andreas Guta | Nick Rossenbach | Miguel Graça | Hermann Ney
Proceedings of the 13th International Conference on Spoken Language Translation

This work describes the statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign of International Workshop on Spoken Language Translation (IWSLT) 2016. We have participated in the MT track for the German→English language pair employing our state-of-the-art phrase-based system, neural machine translation implementation and our joint translation and reordering decoder. Furthermore, we have applied feed-forward and recurrent neural language and translation models for reranking. The attention-based approach has been used for reranking the n-best lists for both phrasebased and hierarchical setups. On top of these systems, we make use of system combination to enhance the translation quality by combining individually trained systems.

pdf bib
Exponentially Decaying Bag-of-Words Input Features for Feed-Forward Neural Network in Statistical Machine Translation
Jan-Thorsten Peter | Weiyue Wang | Hermann Ney
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
The RWTH Aachen University English-Romanian Machine Translation System for WMT 2016
Jan-Thorsten Peter | Tamer Alkhouli | Andreas Guta | Hermann Ney
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
CharacTer: Translation Edit Rate on Character Level
Weiyue Wang | Jan-Thorsten Peter | Hendrik Rosendahl | Hermann Ney
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

2015

pdf bib
The RWTH Aachen machine translation system for IWSLT 2015
Jan-Thorsten Peter | Farzad Toutounchi | Stephan Peitz | Parnia Bahar | Andreas Guta | Hermann Ney
Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
A Comparison between Count and Neural Network Models Based on Joint Translation and Reordering Sequences
Andreas Guta | Tamer Alkhouli | Jan-Thorsten Peter | Joern Wuebker | Hermann Ney
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
The RWTH Aachen German-English Machine Translation System for WMT 2015
Jan-Thorsten Peter | Farzad Toutounchi | Joern Wuebker | Hermann Ney
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
Local System Voting Feature for Machine Translation System Combination
Markus Freitag | Jan-Thorsten Peter | Stephan Peitz | Minwei Feng | Hermann Ney
Proceedings of the Tenth Workshop on Statistical Machine Translation

2013

This work describes the statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign International Workshop on Spoken Language Translation (IWSLT) 2013. We participated in the English→French, English↔German, Arabic→English, Chinese→English and Slovenian↔English MT tracks and the English→French and English→German SLT tracks. We apply phrase-based and hierarchical SMT decoders, which are augmented by state-of-the-art extensions. The novel techniques we experimentally evaluate include discriminative phrase training, a continuous space language model, a hierarchical reordering model, a word class language model, domain adaptation via data selection and system combination of standard and reverse order models. By application of these methods we can show considerable improvements over the respective baseline systems.

pdf bib
Advancements in Reordering Models for Statistical Machine Translation
Minwei Feng | Jan-Thorsten Peter | Hermann Ney
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib abs
Sequence labeling-based reordering model for phrase-based SMT
Minwei Feng | Jan-Thorsten Peter | Hermann Ney
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers

For current statistical machine translation system, reordering is still a major problem for language pairs like Chinese-English, where the source and target language have significant word order differences. In this paper, we propose a novel reordering model based on sequence labeling techniques. Our model converts the reordering problem into a sequence labeling problem, i.e. a tagging task. For the given source sentence, we assign each source token a label which contains the reordering information for that token. We also design an unaligned word tag so that the unaligned word phenomenon is automatically implanted in the proposed model. Our reordering model is conditioned on the whole source sentence. Hence it is able to catch the long dependency in the source sentence. Although the learning on large scale task requests notably amounts of computational resources, the decoder makes use of the tagging information as soft constraints. Therefore, the training procedure of our model is computationally expensive for large task while in the test phase (during translation) our model is very efficient. We carried out experiments on five Chinese-English NIST tasks trained with BOLT data. Results show that our model improves the baseline system by 1.32 BLEU 1.53 TER on average.

2011

pdf bib abs
Soft string-to-dependency hierarchical machine translation
Jan-Thorsten Peter | Matthias Huck | Hermann Ney | Daniel Stein
Proceedings of the 8th International Workshop on Spoken Language Translation: Papers

In this paper, we dissect the influence of several target-side dependency-based extensions to hierarchical machine translation, including a dependency language model (LM). We pursue a non-restrictive approach that does not prohibit the production of hypotheses with malformed dependency structures. Since many questions remained open from previous and related work, we offer in-depth analysis of the influence of the language model order, the impact of dependency-based restrictions on the search space, and the information to be gained from dependency tree building during decoding. The application of a non-restrictive approach together with an integrated dependency LM scoring is a novel contribution which yields significant improvements for two large-scale translation tasks for the language pairs Chinese–English and German–French.