Markus Freitag


2021

pdf bib
Assessing Reference-Free Peer Evaluation for Machine Translation
Sweta Agrawal | George Foster | Markus Freitag | Colin Cherry
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Reference-free evaluation has the potential to make machine translation evaluation substantially more scalable, allowing us to pivot easily to new languages or domains. It has been recently shown that the probabilities given by a large, multilingual model can achieve state of the art results when used as a reference-free metric. We experiment with various modifications to this model, and demonstrate that by scaling it up we can match the performance of BLEU. We analyze various potential weaknesses of the approach, and find that it is surprisingly robust and likely to offer reasonable performance across a broad spectrum of domains and different system qualities.

2020

pdf bib
Translationese as a Language in “Multilingual” NMT
Parker Riley | Isaac Caswell | Markus Freitag | David Grangier
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Machine translation has an undesirable propensity to produce “translationese” artifacts, which can lead to higher BLEU scores while being liked less by human raters. Motivated by this, we model translationese and original (i.e. natural) text as separate languages in a multilingual model, and pose the question: can we perform zero-shot translation between original source text and original target text? There is no data with original source and original target, so we train a sentence-level classifier to distinguish translationese from original target text, and use this classifier to tag the training data for an NMT model. Using this technique we bias the model to produce more natural outputs at test time, yielding gains in human evaluation scores on both accuracy and fluency. Additionally, we demonstrate that it is possible to bias the model to produce translationese and game the BLEU score, increasing it while decreasing human-rated quality. We analyze these outputs using metrics measuring the degree of translationese, and present an analysis of the volatility of heuristic-based train-data tagging.

pdf bib
Complete Multilingual Neural Machine Translation
Markus Freitag | Orhan Firat
Proceedings of the Fifth Conference on Machine Translation

Multilingual Neural Machine Translation (MNMT) models are commonly trained on a joint set of bilingual corpora which is acutely English-centric (i.e. English either as source or target language). While direct data between two languages that are non-English is explicitly available at times, its use is not common. In this paper, we first take a step back and look at the commonly used bilingual corpora (WMT), and resurface the existence and importance of implicit structure that existed in it: multi-way alignment across examples (the same sentence in more than two languages). We set out to study the use of multi-way aligned examples in order to enrich the original English-centric parallel corpora. We reintroduce this direct parallel data from multi-way aligned corpora between all source and target languages. By doing so, the English-centric graph expands into a complete graph, every language pair being connected. We call MNMT with such connectivity pattern complete Multilingual Neural Machine Translation (cMNMT) and demonstrate its utility and efficacy with a series of experiments and analysis. In combination with a novel training data sampling strategy that is conditioned on the target language only, cMNMT yields competitive translation quality for all language pairs. We further study the size effect of multi-way aligned data, its transfer learning capabilities and how it eases adding a new language in MNMT. Finally, we stress test cMNMT at scale and demonstrate that we can train a cMNMT model with up to 12,432 language pairs that provides competitive translation quality for all language pairs.

pdf bib
Findings of the WMT 2020 Shared Task on Automatic Post-Editing
Rajen Chatterjee | Markus Freitag | Matteo Negri | Marco Turchi
Proceedings of the Fifth Conference on Machine Translation

We present the results of the 6th round of the WMT task on MT Automatic Post-Editing. The task consists in automatically correcting the output of a “black-box” machine translation system by learning from existing human corrections of different sentences. This year, the challenge consisted of fixing the errors present in English Wikipedia pages translated into German and Chinese by state-ofthe-art, not domain-adapted neural MT (NMT) systems unknown to participants. Six teams participated in the English-German task, submitting a total of 11 runs. Two teams participated in the English-Chinese task submitting 2 runs each. Due to i) the different source/domain of data compared to the past (Wikipedia vs Information Technology), ii) the different quality of the initial translations to be corrected and iii) the introduction of a new language pair (English-Chinese), this year’s results are not directly comparable with last year’s round. However, on both language directions, participants’ submissions show considerable improvements over the baseline results. On English-German, the top ranked system improves over the baseline by -11.35 TER and +16.68 BLEU points, while on EnglishChinese the improvements are respectively up to -12.13 TER and +14.57 BLEU points. Overall, coherent gains are also highlighted by the outcomes of human evaluation, which confirms the effectiveness of APE to improve MT quality, especially in the new generic domain selected for this year’s round.

pdf bib
Results of the WMT20 Metrics Shared Task
Nitika Mathur | Johnny Wei | Markus Freitag | Qingsong Ma | Ondřej Bojar
Proceedings of the Fifth Conference on Machine Translation

This paper presents the results of the WMT20 Metrics Shared Task. Participants were asked to score the outputs of the translation systems competing in the WMT20 News Translation Task with automatic metrics. Ten research groups submitted 27 metrics, four of which are reference-less “metrics”. In addition, we computed five baseline metrics, including sentBLEU, BLEU, TER and using the SacreBLEU scorer. All metrics were evaluated on how well they correlate at the system-, document- and segment-level with the WMT20 official human scores. We present an extensive analysis on influence of different reference translations on metric reliability, how well automatic metrics score human translations, and we also flag major discrepancies between metric and human scores when evaluating MT systems. Finally, we investigate whether we can use automatic metrics to flag incorrect human ratings.

pdf bib
Learning to Evaluate Translation Beyond English: BLEURT Submissions to the WMT Metrics 2020 Shared Task
Thibault Sellam | Amy Pu | Hyung Won Chung | Sebastian Gehrmann | Qijun Tan | Markus Freitag | Dipanjan Das | Ankur Parikh
Proceedings of the Fifth Conference on Machine Translation

The quality of machine translation systems has dramatically improved over the last decade, and as a result, evaluation has become an increasingly challenging problem. This paper describes our contribution to the WMT 2020 Metrics Shared Task, the main benchmark for automatic evaluation of translation. We make several submissions based on BLEURT, a previously published which uses transfer learning. We extend the metric beyond English and evaluate it on 14 language pairs for which fine-tuning data is available, as well as 4 “zero-shot” language pairs, for which we have no labelled examples. Additionally, we focus on English to German and demonstrate how to combine BLEURT’s predictions with those of YiSi and use alternative reference translations to enhance the performance. Empirical results show that the models achieve competitive results on the WMT Metrics 2019 Shared Task, indicating their promise for the 2020 edition.

pdf bib
Human-Paraphrased References Improve Neural Machine Translation
Markus Freitag | George Foster | David Grangier | Colin Cherry
Proceedings of the Fifth Conference on Machine Translation

Automatic evaluation comparing candidate translations to human-generated paraphrases of reference translations has recently been proposed by freitag2020bleu. When used in place of original references, the paraphrased versions produce metric scores that correlate better with human judgment. This effect holds for a variety of different automatic metrics, and tends to favor natural formulations over more literal (translationese) ones. In this paper we compare the results of performing end-to-end system development using standard and paraphrased references. With state-of-the-art English-German NMT components, we show that tuning to paraphrased references produces a system that is ignificantly better according to human judgment, but 5 BLEU points worse when tested on standard references. Our work confirms the finding that paraphrased references yield metric scores that correlate better with human judgment, and demonstrates for the first time that using these scores for system development can lead to significant improvements.

pdf bib
BLEU might be Guilty but References are not Innocent
Markus Freitag | David Grangier | Isaac Caswell
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The quality of automatic metrics for machine translation has been increasingly called into question, especially for high-quality systems. This paper demonstrates that, while choice of metric is important, the nature of the references is also critical. We study different methods to collect references and compare their value in automated evaluation by reporting correlation with human evaluation for a variety of systems and metrics. Motivated by the finding that typical references exhibit poor diversity, concentrating around translationese language, we develop a paraphrasing task for linguists to perform on existing reference translations, which counteracts this bias. Our method yields higher correlation with human judgment not only for the submissions of WMT 2019 English to German, but also for Back-translation and APE augmented MT output, which have been shown to have low correlation with automatic metrics using standard references. We demonstrate that our methodology improves correlation with all modern evaluation metrics we look at, including embedding-based methods.To complete this picture, we reveal that multi-reference BLEU does not improve the correlation for high quality output, and present an alternative multi-reference formulation that is more effective.

pdf bib
KoBE: Knowledge-Based Machine Translation Evaluation
Zorik Gekhman | Roee Aharoni | Genady Beryozkin | Markus Freitag | Wolfgang Macherey
Findings of the Association for Computational Linguistics: EMNLP 2020

We propose a simple and effective method for machine translation evaluation which does not require reference translations. Our approach is based on (1) grounding the entity mentions found in each source sentence and candidate translation against a large-scale multilingual knowledge base, and (2) measuring the recall of the grounded entities found in the candidate vs. those found in the source. Our approach achieves the highest correlation with human judgements on 9 out of the 18 language pairs from the WMT19 benchmark for evaluation without references, which is the largest number of wins for a single evaluation method on this task. On 4 language pairs, we also achieve higher correlation with human judgements than BLEU. To foster further research, we release a dataset containing 1.8 million grounded entity mentions across 18 language pairs from the WMT19 metrics track data.

2019

pdf bib
APE at Scale and Its Implications on MT Evaluation Biases
Markus Freitag | Isaac Caswell | Scott Roy
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)

In this work, we train an Automatic Post-Editing (APE) model and use it to reveal biases in standard MT evaluation procedures. The goal of our APE model is to correct typical errors introduced by the translation process, and convert the “translationese” output into natural text. Our APE model is trained entirely on monolingual data that has been round-trip translated through English, to mimic errors that are similar to the ones introduced by NMT. We apply our model to the output of existing NMT systems, and demonstrate that, while the human-judged quality improves in all cases, BLEU scores drop with forward-translated test sets. We verify these results for the WMT18 English to German, WMT15 English to French, and WMT16 English to Romanian tasks. Furthermore, we selectively apply our APE model on the output of the top submissions of the most recent WMT evaluation campaigns. We see quality improvements on all tasks of up to 2.5 BLEU points.

2018

pdf bib
Unsupervised Natural Language Generation with Denoising Autoencoders
Markus Freitag | Scott Roy
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Generating text from structured data is important for various tasks such as question answering and dialog systems. We show that in at least one domain, without any supervision and only based on unlabeled text, we are able to build a Natural Language Generation (NLG) system with higher performance than supervised approaches. In our approach, we interpret the structured data as a corrupt representation of the desired output and use a denoising auto-encoder to reconstruct the sentence. We show how to introduce noise into training examples that do not contain structured data, and that the resulting denoising auto-encoder generalizes to generate correct sentences when given structured data.

2017

pdf bib
Beam Search Strategies for Neural Machine Translation
Markus Freitag | Yaser Al-Onaizan
Proceedings of the First Workshop on Neural Machine Translation

The basic concept in Neural Machine Translation (NMT) is to train a large Neural Network that maximizes the translation performance on a given parallel corpus. NMT is then using a simple left-to-right beam-search decoder to generate new translations that approximately maximize the trained conditional probability. The current beam search strategy generates the target sentence word by word from left-to-right while keeping a fixed amount of active candidates at each time step. First, this simple search is less adaptive as it also expands candidates whose scores are much worse than the current best. Secondly, it does not expand hypotheses if they are not within the best scoring candidates, even if their scores are close to the best one. The latter one can be avoided by increasing the beam size until no performance improvement can be observed. While you can reach better performance, this has the drawback of a slower decoding speed. In this paper, we concentrate on speeding up the decoder by applying a more flexible beam search strategy whose candidate size may vary at each time step depending on the candidate scores. We speed up the original decoder by up to 43% for the two language pairs German to English and Chinese to English without losing any translation quality.

2015

pdf bib
Local System Voting Feature for Machine Translation System Combination
Markus Freitag | Jan-Thorsten Peter | Stephan Peitz | Minwei Feng | Hermann Ney
Proceedings of the Tenth Workshop on Statistical Machine Translation

2014

pdf bib
Combined spoken language translation
Markus Freitag | Joern Wuebker | Stephan Peitz | Hermann Ney | Matthias Huck | Alexandra Birch | Nadir Durrani | Philipp Koehn | Mohammed Mediani | Isabel Slawik | Jan Niehues | Eunach Cho | Alex Waibel | Nicola Bertoldi | Mauro Cettolo | Marcello Federico
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign

EU-BRIDGE is a European research project which is aimed at developing innovative speech translation technology. One of the collaborative efforts within EU-BRIDGE is to produce joint submissions of up to four different partners to the evaluation campaign at the 2014 International Workshop on Spoken Language Translation (IWSLT). We submitted combined translations to the German→English spoken language translation (SLT) track as well as to the German→English, English→German and English→French machine translation (MT) tracks. In this paper, we present the techniques which were applied by the different individual translation systems of RWTH Aachen University, the University of Edinburgh, Karlsruhe Institute of Technology, and Fondazione Bruno Kessler. We then show the combination approach developed at RWTH Aachen University which combined the individual systems. The consensus translations yield empirical gains of up to 2.3 points in BLEU and 1.2 points in TER compared to the best individual system.

pdf bib
Better punctuation prediction with hierarchical phrase-based translation
Stephan Peitz | Markus Freitag | Hermann Ney
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers

Punctuation prediction is an important task in spoken language translation and can be performed by using a monolingual phrase-based translation system to translate from unpunctuated to text with punctuation. However, a punctuation prediction system based on phrase-based translation is not able to capture long-range dependencies between words and punctuation marks. In this paper, we propose to employ hierarchical translation in place of phrase-based translation and show that this approach is more robust for unseen word sequences. Furthermore, we analyze different optimization criteria for tuning the scaling factors of a monolingual statistical machine translation system. In our experiments, we compare the new approach with other punctuation prediction methods and show improvements in terms of F1-Score and BLEU on the IWSLT 2014 German→English and English→French translation tasks.

pdf bib
EU-BRIDGE MT: Combined Machine Translation
Markus Freitag | Stephan Peitz | Joern Wuebker | Hermann Ney | Matthias Huck | Rico Sennrich | Nadir Durrani | Maria Nadejde | Philip Williams | Philipp Koehn | Teresa Herrmann | Eunah Cho | Alex Waibel
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
The RWTH Aachen German-English Machine Translation System for WMT 2014
Stephan Peitz | Joern Wuebker | Markus Freitag | Hermann Ney
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Jane: Open Source Machine Translation System Combination
Markus Freitag | Matthias Huck | Hermann Ney
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics

2013

pdf bib
Reverse Word Order Model
Markus Freitag | Minwei Feng | Matthias Huck | Stephan Peitz | Hermann Ney
Proceedings of Machine Translation Summit XIV: Papers

pdf bib
The RWTH Aachen machine translation systems for IWSLT 2013
Joern Wuebker | Stephan Peitz | Tamer Alkhouli | Jan-Thorsten Peter | Minwei Feng | Markus Freitag | Hermann Ney
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign

This work describes the statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign International Workshop on Spoken Language Translation (IWSLT) 2013. We participated in the English→French, English↔German, Arabic→English, Chinese→English and Slovenian↔English MT tracks and the English→French and English→German SLT tracks. We apply phrase-based and hierarchical SMT decoders, which are augmented by state-of-the-art extensions. The novel techniques we experimentally evaluate include discriminative phrase training, a continuous space language model, a hierarchical reordering model, a word class language model, domain adaptation via data selection and system combination of standard and reverse order models. By application of these methods we can show considerable improvements over the respective baseline systems.

pdf bib
EU-BRIDGE MT: text translation of talks in the EU-BRIDGE project
Markus Freitag | Stephan Peitz | Joern Wuebker | Hermann Ney | Nadir Durrani | Matthias Huck | Philipp Koehn | Thanh-Le Ha | Jan Niehues | Mohammed Mediani | Teresa Herrmann | Alex Waibel | Nicola Bertoldi | Mauro Cettolo | Marcello Federico
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign

EU-BRIDGE1 is a European research project which is aimed at developing innovative speech translation technology. This paper describes one of the collaborative efforts within EUBRIDGE to further advance the state of the art in machine translation between two European language pairs, English→French and German→English. Four research institutions involved in the EU-BRIDGE project combined their individual machine translation systems and participated with a joint setup in the machine translation track of the evaluation campaign at the 2013 International Workshop on Spoken Language Translation (IWSLT). We present the methods and techniques to achieve high translation quality for text translation of talks which are applied at RWTH Aachen University, the University of Edinburgh, Karlsruhe Institute of Technology, and Fondazione Bruno Kessler. We then show how we have been able to considerably boost translation performance (as measured in terms of the metrics BLEU and TER) by means of system combination. The joint setups yield empirical gains of up to 1.4 points in BLEU and 2.8 points in TER on the IWSLT test sets compared to the best single systems.

pdf bib
A Performance Study of Cube Pruning for Large-Scale Hierarchical Machine Translation
Matthias Huck | David Vilar | Markus Freitag | Hermann Ney
Proceedings of the Seventh Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
Joint WMT 2013 Submission of the QUAERO Project
Stephan Peitz | Saab Mansour | Matthias Huck | Markus Freitag | Hermann Ney | Eunah Cho | Teresa Herrmann | Mohammed Mediani | Jan Niehues | Alex Waibel | Alexander Allauzen | Quoc Khanh Do | Bianka Buschbeck | Tonio Wandmacher
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
The RWTH Aachen Machine Translation System for WMT 2013
Stephan Peitz | Saab Mansour | Jan-Thorsten Peter | Christoph Schmidt | Joern Wuebker | Matthias Huck | Markus Freitag | Hermann Ney
Proceedings of the Eighth Workshop on Statistical Machine Translation

2012

pdf bib
Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation
Joern Wuebker | Matthias Huck | Stephan Peitz | Malte Nuhn | Markus Freitag | Jan-Thorsten Peter | Saab Mansour | Hermann Ney
Proceedings of COLING 2012: Demonstration Papers

pdf bib
Discriminative Reordering Extensions for Hierarchical Phrase-Based Machine Translation
Matthias Huck | Stephan Peitz | Markus Freitag | Hermann Ney
Proceedings of the 16th Annual conference of the European Association for Machine Translation

pdf bib
The RWTH Aachen speech recognition and machine translation system for IWSLT 2012
Stephan Peitz | Saab Mansour | Markus Freitag | Minwei Feng | Matthias Huck | Joern Wuebker | Malte Nuhn | Markus Nußbaum-Thom | Hermann Ney
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper, the automatic speech recognition (ASR) and statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2012 are presented. We participated in the ASR (English), MT (English-French, Arabic-English, Chinese-English, German-English) and SLT (English-French) tracks. For the MT track both hierarchical and phrase-based SMT decoders are applied. A number of different techniques are evaluated in the MT and SLT tracks, including domain adaptation via data selection, translation model interpolation, phrase training for hierarchical and phrase-based systems, additional reordering model, word class language model, various Arabic and Chinese segmentation methods, postprocessing of speech recognition output with an SMT system, and system combination. By application of these methods we can show considerable improvements over the respective baseline systems.

pdf bib
Review of Hypothesis Alignment Algorithms for MT System Combination via Confusion Network Decoding
Antti-Veikko Rosti | Xiaodong He | Damianos Karakos | Gregor Leusch | Yuan Cao | Markus Freitag | Spyros Matsoukas | Hermann Ney | Jason Smith | Bing Zhang
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
The RWTH Aachen Machine Translation System for WMT 2012
Matthias Huck | Stephan Peitz | Markus Freitag | Malte Nuhn | Hermann Ney
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Joint WMT 2012 Submission of the QUAERO Project
Markus Freitag | Stephan Peitz | Matthias Huck | Hermann Ney | Jan Niehues | Teresa Herrmann | Alex Waibel | Hai-son Le | Thomas Lavergne | Alexandre Allauzen | Bianka Buschbeck | Josep Maria Crego | Jean Senellart
Proceedings of the Seventh Workshop on Statistical Machine Translation

2011

pdf bib
The RWTH System Combination System for WMT 2011
Gregor Leusch | Markus Freitag | Hermann Ney
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Joint WMT Submission of the QUAERO Project
Markus Freitag | Gregor Leusch | Joern Wuebker | Stephan Peitz | Hermann Ney | Teresa Herrmann | Jan Niehues | Alex Waibel | Alexandre Allauzen | Gilles Adda | Josep Maria Crego | Bianka Buschbeck | Tonio Wandmacher | Jean Senellart
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
The RWTH Aachen Machine Translation System for WMT 2011
Matthias Huck | Joern Wuebker | Christoph Schmidt | Markus Freitag | Stephan Peitz | Daniel Stein | Arnaud Dagnelies | Saab Mansour | Gregor Leusch | Hermann Ney
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
The RWTH Aachen machine translation system for IWSLT 2011
Joern Wuebker | Matthias Huck | Saab Mansour | Markus Freitag | Minwei Feng | Stephan Peitz | Christoph Schmidt | Hermann Ney
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper the statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2011 is presented. We participated in the MT (English-French, Arabic-English, ChineseEnglish) and SLT (English-French) tracks. Both hierarchical and phrase-based SMT decoders are applied. A number of different techniques are evaluated, including domain adaptation via monolingual and bilingual data selection, phrase training, different lexical smoothing methods, additional reordering models for the hierarchical system, various Arabic and Chinese segmentation methods, punctuation prediction for speech recognition output, and system combination. By application of these methods we can show considerable improvements over the respective baseline systems.

pdf bib
Advances on spoken language translation in the Quaero program
Karim Boudahmane | Bianka Buschbeck | Eunah Cho | Josep Maria Crego | Markus Freitag | Thomas Lavergne | Hermann Ney | Jan Niehues | Stephan Peitz | Jean Senellart | Artem Sokolov | Alex Waibel | Tonio Wandmacher | Joern Wuebker | François Yvon
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

The Quaero program is an international project promoting research and industrial innovation on technologies for automatic analysis and classification of multimedia and multilingual documents. Within the program framework, research organizations and industrial partners collaborate to develop prototypes of innovating applications and services for access and usage of multimedia data. One of the topics addressed is the translation of spoken language. Each year, a project-internal evaluation is conducted by DGA to monitor the technological advances. This work describes the design and results of the 2011 evaluation campaign. The participating partners were RWTH, KIT, LIMSI and SYSTRAN. Their approaches are compared on both ASR output and reference transcripts of speech data for the translation between French and German. The results show that the developed techniques further the state of the art and improve translation quality.

pdf bib
Modeling punctuation prediction as machine translation
Stephan Peitz | Markus Freitag | Arne Mauser | Hermann Ney
Proceedings of the 8th International Workshop on Spoken Language Translation: Papers

Punctuation prediction is an important task in Spoken Language Translation. The output of speech recognition systems does not typically contain punctuation marks. In this paper we analyze different methods for punctuation prediction and show improvements in the quality of the final translation output. In our experiments we compare the different approaches and show improvements of up to 0.8 BLEU points on the IWSLT 2011 English French Speech Translation of Talks task using a translation system to translate from unpunctuated to punctuated text instead of a language model based punctuation prediction method. Furthermore, we do a system combination of the hypotheses of all our different approaches and get an additional improvement of 0.4 points in BLEU.