Jakob Verbeek
2020
Online Versus Offline NMT Quality: An In-depth Analysis on English-German and German-English
Maha Elbayad
|
Michael Ustaszewski
|
Emmanuelle Esperança-Rodier
|
Francis Brunet-Manquat
|
Jakob Verbeek
|
Laurent Besacier
Proceedings of the 28th International Conference on Computational Linguistics
We conduct in this work an evaluation study comparing offline and online neural machine translation architectures. Two sequence-to-sequence models: convolutional Pervasive Attention (Elbayad et al. 2018) and attention-based Transformer (Vaswani et al. 2017) are considered. We investigate, for both architectures, the impact of online decoding constraints on the translation quality through a carefully designed human evaluation on English-German and German-English language pairs, the latter being particularly sensitive to latency constraints. The evaluation results allow us to identify the strengths and shortcomings of each model when we shift to the online setup.
2018
Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction
Maha Elbayad
|
Laurent Besacier
|
Jakob Verbeek
Proceedings of the 22nd Conference on Computational Natural Language Learning
Current state-of-the-art machine translation systems are based on encoder-decoder architectures, that first encode the input sequence, and then generate an output sequence based on the input encoding. Both are interfaced with an attention mechanism that recombines a fixed encoding of the source tokens based on the decoder state. We propose an alternative approach which instead relies on a single 2D convolutional neural network across both sequences. Each layer of our network re-codes source tokens on the basis of the output sequence produced so far. Attention-like properties are therefore pervasive throughout the network. Our model yields excellent results, outperforming state-of-the-art encoder-decoder systems, while being conceptually simpler and having fewer parameters.
Token-level and sequence-level loss smoothing for RNN language models
Maha Elbayad
|
Laurent Besacier
|
Jakob Verbeek
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Despite the effectiveness of recurrent neural network language models, their maximum likelihood estimation suffers from two limitations. It treats all sentences that do not match the ground truth as equally poor, ignoring the structure of the output space. Second, it suffers from ’exposure bias’: during training tokens are predicted given ground-truth sequences, while at test time prediction is conditioned on generated output sequences. To overcome these limitations we build upon the recent reward augmented maximum likelihood approach that encourages the model to predict sentences that are close to the ground truth according to a given performance metric. We extend this approach to token-level loss smoothing, and propose improvements to the sequence-level smoothing approach. Our experiments on two different tasks, image captioning and machine translation, show that token-level and sequence-level loss smoothing are complementary, and significantly improve results.