Domenic Donato


2021

pdf bib
Diverse Pretrained Context Encodings Improve Document Translation
Domenic Donato | Lei Yu | Chris Dyer
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We propose a new architecture for adapting a sentence-level sequence-to-sequence transformer by incorporating multiple pre-trained document context signals and assess the impact on translation performance of (1) different pretraining approaches for generating these signals, (2) the quantity of parallel data for which document context is available, and (3) conditioning on source, target, or source and target contexts. Experiments on the NIST Chinese-English, and IWSLT and WMT English-German tasks support four general conclusions: that using pre-trained context representations markedly improves sample efficiency, that adequate parallel data resources are crucial for learning to use document context, that jointly conditioning on multiple context representations outperforms any single representation, and that source context is more valuable for translation performance than target side context. Our best multi-context model consistently outperforms the best existing context-aware transformers.

2020

pdf bib
The DeepMind Chinese–English Document Translation System at WMT2020
Lei Yu | Laurent Sartran | Po-Sen Huang | Wojciech Stokowiec | Domenic Donato | Srivatsan Srinivasan | Alek Andreev | Wang Ling | Sona Mokra | Agustin Dal Lago | Yotam Doron | Susannah Young | Phil Blunsom | Chris Dyer
Proceedings of the Fifth Conference on Machine Translation

This paper describes the DeepMind submission to the ChineseEnglish constrained data track of the WMT2020 Shared Task on News Translation. The submission employs a noisy channel factorization as the backbone of a document translation system. This approach allows the flexible combination of a number of independent component models which are further augmented with back-translation, distillation, fine-tuning with in-domain data, Monte-Carlo Tree Search decoding, and improved uncertainty estimation. In order to address persistent issues with the premature truncation of long sequences we included specialized length models and sentence segmentation techniques. Our final system provides a 9.9 BLEU points improvement over a baseline Transformer on our test set (newstest 2019).