Proceedings of the First Workshop on Neural Machine Translation

Proceedings of the First Workshop on Neural Machine Translation Thang Luong Alexandra Birch Graham Neubig Andrew Finch August 2017

Vancouver

Association for Computational Linguistics http://www.aclweb.org/anthology/W17-32 book NMT:2017 An Empirical Study of Adequate Vision Span for Attention-Based Neural Machine Translation RaphaelShu HidekiNakayama Proceedings of the First Workshop on Neural Machine Translation August 2017

Vancouver

Association for Computational Linguistics 1–10 http://www.aclweb.org/anthology/W17-3201 Recently, the attention mechanism plays a key role to achieve high performance for Neural Machine Translation models. However, as it computes a score function for the encoder states in all positions at each decoding step, the attention model greatly increases the computational complexity. In this paper, we investigate the adequate vision span of attention models in the context of machine translation, by proposing a novel attention framework that is capable of reducing redundant score computation dynamically. The term "vision span"' means a window of the encoder states considered by the attention model in one step. In our experiments, we found that the average window size of vision span can be reduced by over 50% with modest loss in accuracy on English-Japanese and German-English translation tasks. inproceedings shu-nakayama:2017:NMT Analyzing Neural MT Search and Model Performance JanNiehues EunahCho Thanh-LeHa AlexWaibel Proceedings of the First Workshop on Neural Machine Translation August 2017

Vancouver

Association for Computational Linguistics 11–17 http://www.aclweb.org/anthology/W17-3202 In this paper, we offer an in-depth analysis about the modeling and search performance. We address the question if a more complex search algorithm is necessary. Furthermore, we investigate the question if more complex models which might only be applicable during rescoring are promising. By separating the search space and the modeling using n-best list reranking, we analyze the influence of both parts of an NMT system independently. By comparing differently performing NMT systems, we show that the better translation is already in the search space of the translation systems with less performance. This results indicate that the current search algorithms are sufficient for the NMT systems. Furthermore, we could show that even a relatively small $n$-best list of $50$ hypotheses already contain notably better translations. inproceedings niehues-EtAl:2017:NMT Stronger Baselines for Trustable Results in Neural Machine Translation MichaelDenkowski GrahamNeubig Proceedings of the First Workshop on Neural Machine Translation August 2017

Vancouver

Association for Computational Linguistics 18–27 http://www.aclweb.org/anthology/W17-3203 Interest in neural machine translation has grown rapidly as its effectiveness has been demonstrated across language and data scenarios. New research regularly introduces architectural and algorithmic improvements that lead to significant gains over “vanilla” NMT implementations. However, these new techniques are rarely evaluated in the context of previously published techniques, specifically those that are widely used in state-of-the-art production and shared-task systems. As a result, it is often difficult to determine whether improvements from research will carry over to systems deployed for real-world use. In this work, we recommend three specific methods that are relatively easy to implement and result in much stronger experimental systems. Beyond reporting significantly higher BLEU scores, we conduct an in-depth analysis of where improvements originate and what inherent weaknesses of basic NMT models are being addressed. We then compare the relative gains afforded by several other techniques proposed in the literature when starting with vanilla systems versus our stronger baselines, showing that experimental conclusions may change depending on the baseline chosen. This indicates that choosing a strong baseline is crucial for reporting reliable experimental results. inproceedings denkowski-neubig:2017:NMT Six Challenges for Neural Machine Translation PhilippKoehn RebeccaKnowles Proceedings of the First Workshop on Neural Machine Translation August 2017

Vancouver

Association for Computational Linguistics 28–39 http://www.aclweb.org/anthology/W17-3204 We explore six challenges for neural machine translation: domain mismatch, amount of training data, rare words, long sentences, word alignment, and beam search. We show both deficiencies and improvements over the quality of phrase-based statistical machine translation. inproceedings koehn-knowles:2017:NMT Cost Weighting for Neural Machine Translation Domain Adaptation BoxingChen ColinCherry GeorgeFoster SamuelLarkin Proceedings of the First Workshop on Neural Machine Translation August 2017

Vancouver

Association for Computational Linguistics 40–46 http://www.aclweb.org/anthology/W17-3205 In this paper, we propose a new domain adaptation technique for neural machine translation called cost weighting, which is appropriate for adaptation scenarios in which a small in-domain data set and a large general-domain data set are available. Cost weighting incorporates a domain classifier into the neural machine translation training algorithm, using features derived from the encoder representation in order to distinguish in-domain from out-of-domain data. Classifier probabilities are used to weight sentences according to their domain similarity when updating the parameters of the neural translation model. We compare cost weighting to two traditional domain adaptation techniques developed for statistical machine translation: data selection and sub-corpus weighting. Experiments on two large-data tasks show that both the traditional techniques and our novel proposal lead to significant gains, with cost weighting outperforming the traditional methods. inproceedings chen-EtAl:2017:NMT Detecting Untranslated Content for Neural Machine Translation IsaoGoto HidekiTanaka Proceedings of the First Workshop on Neural Machine Translation August 2017

Vancouver

Association for Computational Linguistics 47–55 http://www.aclweb.org/anthology/W17-3206 Despite its promise, neural machine translation (NMT) has a serious problem in that source content may be mistakenly left untranslated. The ability to detect untranslated content is important for the practical use of NMT. We evaluate two types of probability with which to detect untranslated content: the cumulative attention (ATN) probability and back translation (BT) probability from the target sentence to the source sentence. Experiments on detecting untranslated content in Japanese-English patent translations show that ATN and BT are each more effective than random choice, BT is more effective than ATN, and the combination of the two provides further improvements. We also confirmed the effectiveness of using ATN and BT to rerank the n-best NMT outputs. inproceedings goto-tanaka:2017:NMT Beam Search Strategies for Neural Machine Translation MarkusFreitag YaserAl-Onaizan Proceedings of the First Workshop on Neural Machine Translation August 2017

Vancouver

Association for Computational Linguistics 56–60 http://www.aclweb.org/anthology/W17-3207 The basic concept in Neural Machine Translation (NMT) is to train a large Neural Network that maximizes the translation performance on a given parallel corpus. NMT is then using a simple left-to-right beam-search decoder to generate new translations that approximately maximize the trained conditional probability. The current beam search strategy generates the target sentence word by word from left-to-right while keeping a fixed amount of active candidates at each time step. First, this simple search is less adaptive as it also expands candidates whose scores are much worse than the current best. Secondly, it does not expand hypotheses if they are not within the best scoring candidates, even if their scores are close to the best one. The latter one can be avoided by increasing the beam size until no performance improvement can be observed. While you can reach better performance, this has the drawback of a slower decoding speed. In this paper, we concentrate on speeding up the decoder by applying a more flexible beam search strategy whose candidate size may vary at each time step depending on the candidate scores. We speed up the original decoder by up to 43% for the two language pairs German to English and Chinese to English without losing any translation quality. inproceedings freitag-alonaizan:2017:NMT An Empirical Study of Mini-Batch Creation Strategies for Neural Machine Translation MakotoMorishita YusukeOda GrahamNeubig KoichiroYoshino KatsuhitoSudoh SatoshiNakamura Proceedings of the First Workshop on Neural Machine Translation August 2017

Vancouver

Association for Computational Linguistics 61–68 http://www.aclweb.org/anthology/W17-3208 Training of neural machine translation (NMT) models usually uses mini-batches for efficiency purposes. During the mini-batched training process, it is necessary to pad shorter sentences in a mini-batch to be equal in length to the longest sentence therein for efficient computation. Previous work has noted that sorting the corpus based on the sentence length before making mini-batches reduces the amount of padding and increases the processing speed. However, despite the fact that mini-batch creation is an essential step in NMT training, widely used NMT toolkits implement disparate strategies for doing so, which have not been empirically validated or compared. This work investigates mini-batch creation strategies with experiments over two different datasets. Our results suggest that the choice of a mini-batch creation strategy has a large effect on NMT training and some length-based sorting strategies do not always work well compared with simple shuffling. inproceedings morishita-EtAl:2017:NMT Detecting Cross-Lingual Semantic Divergence for Neural Machine Translation MarineCarpuat YogarshiVyas XingNiu Proceedings of the First Workshop on Neural Machine Translation August 2017

Vancouver

Association for Computational Linguistics 69–79 http://www.aclweb.org/anthology/W17-3209 Parallel corpora are often not as parallel as one might assume: non-literal translations and noisy translations abound, even in curated corpora routinely used for training and evaluation. We use a cross-lingual textual entailment system to distinguish sentence pairs that are parallel in meaning from those that are not, and show that filtering out divergent examples from training improves translation quality. inproceedings carpuat-vyas-niu:2017:NMT