<?xml version="1.0" encoding="UTF-8" ?>
<volume id="W17">
  <paper id="3200">
    <title>Proceedings of the First Workshop on Neural Machine Translation</title>
    <editor>Thang Luong</editor>
    <editor>Alexandra Birch</editor>
    <editor>Graham Neubig</editor>
    <editor>Andrew Finch</editor>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver</address>
    <publisher>Association for Computational Linguistics</publisher>
    <url>http://www.aclweb.org/anthology/W17-32</url>
    <bibtype>book</bibtype>
    <bibkey>NMT:2017</bibkey>
  </paper>

  <paper id="3201">
    <title>An Empirical Study of Adequate Vision Span for Attention-Based Neural Machine Translation</title>
    <author><first>Raphael</first><last>Shu</last></author>
    <author><first>Hideki</first><last>Nakayama</last></author>
    <booktitle>Proceedings of the First Workshop on Neural Machine Translation</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>1&#8211;10</pages>
    <url>http://www.aclweb.org/anthology/W17-3201</url>
    <abstract>Recently, the attention mechanism plays a key role to achieve high performance
	for Neural Machine Translation models. However, as it computes a score function
	for the encoder states in all positions at each decoding step, the attention
	model greatly increases the computational complexity. In this paper, we
	investigate the adequate vision span of attention models in the context of
	machine translation, by proposing a novel attention framework that is capable
	of reducing redundant score computation dynamically. The term "vision span"'
	means a window of the encoder states considered by the attention model in one
	step. In our experiments, we found that the average window size of vision span
	can be reduced by over 50% with modest loss in accuracy on English-Japanese and
	German-English translation tasks.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>shu-nakayama:2017:NMT</bibkey>
  </paper>

  <paper id="3202">
    <title>Analyzing Neural MT Search and Model Performance</title>
    <author><first>Jan</first><last>Niehues</last></author>
    <author><first>Eunah</first><last>Cho</last></author>
    <author><first>Thanh-Le</first><last>Ha</last></author>
    <author><first>Alex</first><last>Waibel</last></author>
    <booktitle>Proceedings of the First Workshop on Neural Machine Translation</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>11&#8211;17</pages>
    <url>http://www.aclweb.org/anthology/W17-3202</url>
    <abstract>In this paper, we offer an in-depth analysis about the modeling and search
	performance. We address the question if a more complex search algorithm is
	necessary. Furthermore, we investigate the question if more complex models
	which might only be applicable during rescoring are promising.
	By separating the search space and the modeling using n-best list reranking, we
	analyze the influence of both parts of an NMT system independently. By
	comparing differently performing NMT systems, we show that the better
	translation is already in the search space of the translation systems with less
	performance. This results indicate that the current search algorithms are
	sufficient for the NMT systems. Furthermore, we could show that even a
	relatively small $n$-best list of $50$ hypotheses already contain notably
	better translations.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>niehues-EtAl:2017:NMT</bibkey>
  </paper>

  <paper id="3203">
    <title>Stronger Baselines for Trustable Results in Neural Machine Translation</title>
    <author><first>Michael</first><last>Denkowski</last></author>
    <author><first>Graham</first><last>Neubig</last></author>
    <booktitle>Proceedings of the First Workshop on Neural Machine Translation</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>18&#8211;27</pages>
    <url>http://www.aclweb.org/anthology/W17-3203</url>
    <abstract>Interest in neural machine translation has grown rapidly as its effectiveness
	has been demonstrated across language and data scenarios.  New research
	regularly introduces architectural and algorithmic improvements that lead to
	significant gains over &#x201c;vanilla&#x201d; NMT implementations.  However, these new
	techniques are rarely evaluated in the context of previously published
	techniques, specifically those that are widely used in state-of-the-art
	production and shared-task systems.  As a result, it is often difficult to
	determine whether improvements from research will carry over to systems
	deployed for real-world use.  In this work, we recommend three specific methods
	that are relatively easy to implement and result in much stronger experimental
	systems.  Beyond reporting significantly higher BLEU scores, we conduct an
	in-depth analysis of where improvements originate and what inherent weaknesses
	of basic NMT models are being addressed.  We then compare the relative gains
	afforded by several other techniques proposed in the literature when starting
	with vanilla systems versus our stronger baselines, showing that experimental
	conclusions may change depending on the baseline chosen.  This indicates that
	choosing a strong baseline is crucial for reporting reliable experimental
	results.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>denkowski-neubig:2017:NMT</bibkey>
  </paper>

  <paper id="3204">
    <title>Six Challenges for Neural Machine Translation</title>
    <author><first>Philipp</first><last>Koehn</last></author>
    <author><first>Rebecca</first><last>Knowles</last></author>
    <booktitle>Proceedings of the First Workshop on Neural Machine Translation</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>28&#8211;39</pages>
    <url>http://www.aclweb.org/anthology/W17-3204</url>
    <abstract>We explore six challenges for neural machine translation: domain mismatch,
	amount of training data, rare words, long sentences, word alignment, and beam
	search. We show both deficiencies and improvements over the quality of
	phrase-based statistical machine translation.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>koehn-knowles:2017:NMT</bibkey>
  </paper>

  <paper id="3205">
    <title>Cost Weighting for Neural Machine Translation Domain Adaptation</title>
    <author><first>Boxing</first><last>Chen</last></author>
    <author><first>Colin</first><last>Cherry</last></author>
    <author><first>George</first><last>Foster</last></author>
    <author><first>Samuel</first><last>Larkin</last></author>
    <booktitle>Proceedings of the First Workshop on Neural Machine Translation</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>40&#8211;46</pages>
    <url>http://www.aclweb.org/anthology/W17-3205</url>
    <abstract>In this paper, we propose a new domain adaptation technique for neural machine
	translation called cost weighting, which is appropriate for adaptation
	scenarios in which a small in-domain data set and a large general-domain data
	set are available. Cost weighting incorporates a domain classifier into the
	neural machine translation training algorithm, using features derived from the
	encoder representation in order to distinguish in-domain from out-of-domain
	data. Classifier probabilities are used to weight sentences according to their
	domain similarity when updating the parameters of the neural translation model.
	We compare cost weighting to two traditional domain adaptation techniques
	developed for statistical machine translation: data selection and sub-corpus
	weighting. Experiments on two large-data tasks show that both the traditional
	techniques and our novel proposal lead to significant gains, with cost
	weighting outperforming the traditional methods.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>chen-EtAl:2017:NMT</bibkey>
  </paper>

  <paper id="3206">
    <title>Detecting Untranslated Content for Neural Machine Translation</title>
    <author><first>Isao</first><last>Goto</last></author>
    <author><first>Hideki</first><last>Tanaka</last></author>
    <booktitle>Proceedings of the First Workshop on Neural Machine Translation</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>47&#8211;55</pages>
    <url>http://www.aclweb.org/anthology/W17-3206</url>
    <abstract>Despite its promise, neural machine translation (NMT) has a serious problem in
	that source content may be mistakenly left untranslated. The ability to detect
	untranslated content is important for the practical use of NMT. We evaluate two
	types of probability with which to detect untranslated content: the cumulative
	attention (ATN) probability and back translation (BT) probability from the
	target sentence to the source sentence. Experiments on detecting untranslated
	content in Japanese-English patent translations show that ATN and BT are each
	more effective than random choice, BT is more effective than ATN, and the
	combination of the two provides further improvements. We also confirmed the
	effectiveness of using ATN and BT to rerank the n-best NMT outputs.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>goto-tanaka:2017:NMT</bibkey>
  </paper>

  <paper id="3207">
    <title>Beam Search Strategies for Neural Machine Translation</title>
    <author><first>Markus</first><last>Freitag</last></author>
    <author><first>Yaser</first><last>Al-Onaizan</last></author>
    <booktitle>Proceedings of the First Workshop on Neural Machine Translation</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>56&#8211;60</pages>
    <url>http://www.aclweb.org/anthology/W17-3207</url>
    <abstract>The basic concept in Neural Machine Translation (NMT) is to train a large
	Neural Network that maximizes the translation performance on a given parallel
	corpus. NMT is then using a simple left-to-right beam-search decoder to
	generate new translations that approximately maximize the trained conditional
	probability. The current beam search strategy generates the target sentence
	word by word from left-to-right while keeping a fixed amount of active
	candidates at each time step. First, this simple search is less adaptive as it
	also expands candidates whose scores are much worse than the current best.
	Secondly, it does not expand hypotheses if they are not within the best scoring
	candidates, even if their scores are close to the best one. The latter one can
	be avoided by increasing the beam size until no performance improvement can be
	observed. While you can reach better performance, this has the drawback of a
	slower decoding speed. In this paper, we concentrate on speeding up the decoder
	by applying a more flexible beam search strategy whose candidate size may vary
	at each time step depending on the candidate scores. We speed up the original
	decoder by up to 43% for the two language pairs German to English and Chinese
	to English without losing any translation quality.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>freitag-alonaizan:2017:NMT</bibkey>
  </paper>

  <paper id="3208">
    <title>An Empirical Study of Mini-Batch Creation Strategies for Neural Machine Translation</title>
    <author><first>Makoto</first><last>Morishita</last></author>
    <author><first>Yusuke</first><last>Oda</last></author>
    <author><first>Graham</first><last>Neubig</last></author>
    <author><first>Koichiro</first><last>Yoshino</last></author>
    <author><first>Katsuhito</first><last>Sudoh</last></author>
    <author><first>Satoshi</first><last>Nakamura</last></author>
    <booktitle>Proceedings of the First Workshop on Neural Machine Translation</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>61&#8211;68</pages>
    <url>http://www.aclweb.org/anthology/W17-3208</url>
    <abstract>Training of neural machine translation (NMT) models usually uses mini-batches
	for efficiency purposes.
	During the mini-batched training process, it is necessary to pad shorter
	sentences in a mini-batch to be equal in length to the longest sentence therein
	for efficient computation.
	Previous work has noted that sorting the corpus based on the sentence length
	before making mini-batches reduces the amount of padding and increases the
	processing speed.
	However, despite the fact that mini-batch creation is an essential step in NMT
	training, widely used NMT toolkits implement disparate strategies for doing so,
	which have not been empirically validated or compared.
	This work investigates mini-batch creation strategies with experiments over two
	different datasets.
	Our results suggest that the choice of a mini-batch creation strategy has a
	large effect on NMT training and some length-based sorting strategies do not
	always work well compared with simple shuffling.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>morishita-EtAl:2017:NMT</bibkey>
  </paper>

  <paper id="3209">
    <title>Detecting Cross-Lingual Semantic Divergence for Neural Machine Translation</title>
    <author><first>Marine</first><last>Carpuat</last></author>
    <author><first>Yogarshi</first><last>Vyas</last></author>
    <author><first>Xing</first><last>Niu</last></author>
    <booktitle>Proceedings of the First Workshop on Neural Machine Translation</booktitle>
    <month>August</month>
    <year>2017</year>
    <address>Vancouver</address>
    <publisher>Association for Computational Linguistics</publisher>
    <pages>69&#8211;79</pages>
    <url>http://www.aclweb.org/anthology/W17-3209</url>
    <abstract>Parallel corpora are often not as parallel as one might assume: non-literal
	translations and noisy translations abound, even in curated corpora routinely
	used for training and evaluation. We use a cross-lingual textual entailment
	system to distinguish sentence pairs that are parallel in meaning from those
	that are not, and show that filtering out divergent examples from training
	improves translation quality.</abstract>
    <bibtype>inproceedings</bibtype>
    <bibkey>carpuat-vyas-niu:2017:NMT</bibkey>
  </paper>

</volume>

