Proceedings of the 4th Workshop on Asian Translation (WAT2017)

Proceedings of the 4th Workshop on Asian Translation (WAT2017) Toshiaki Nakazawa Isao Goto November 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing http://www.aclweb.org/anthology/W17-57 book WAT2017:2017 Overview of the 4th Workshop on Asian Translation ToshiakiNakazawa ShoheiHigashiyama ChenchenDing HideyaMino IsaoGoto HidetoKazawa YusukeOda GrahamNeubig SadaoKurohashi Proceedings of the 4th Workshop on Asian Translation (WAT2017) November 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 1–54 http://www.aclweb.org/anthology/W17-5701 This paper presents the results of the shared tasks from the 4th workshop on Asian translation (WAT2017) including J2E, J2C scientific paper translation subtasks, C2J, K2J, E2J patent translation subtasks, H2E mixed domain subtasks, J2E newswire subtasks and J2E recipe subtasks. For the WAT2017, 12 institutions participated in the shared tasks. About 300 translation results have been submitted to the automatic evaluation server, and selected submissions were manually evaluated. inproceedings nakazawa-EtAl:2017:WAT2017 Controlling Target Features in Neural Machine Translation via Prefix Constraints ShunsukeTakeno MasaakiNagata KazuhideYamamoto Proceedings of the 4th Workshop on Asian Translation (WAT2017) November 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 55–63 http://www.aclweb.org/anthology/W17-5702 We propose prefix constraints a novel method to enforce constraints on target sentences in neural machine translation. It places a sequence of special tokens at the beginning of target sentence (target prefix), while side constraints places a special token at the end of source sentence (source suffix). Prefix constraints can be predicted from source sentence jointly with target sentence, while side constraints must be provided by the user or predicted by some other methods. In both methods, special tokens are designed to encode arbitrary features on target-side or metatextual information. We show that prefix constraints are more flexible than side constraints and can be used to control the behavior of neural machine translation, in terms of output length, bidirectional decoding, domain adaptation, and unaligned target word generation. inproceedings takeno-nagata-yamamoto:2017:WAT2017 Improving Japanese-to-English Neural Machine Translation by Paraphrasing the Target Language YuukiSekizawa TomoyukiKajiwara MamoruKomachi Proceedings of the 4th Workshop on Asian Translation (WAT2017) November 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 64–69 http://www.aclweb.org/anthology/W17-5703 Neural machine translation (NMT) produces sentences that are more fluent than those produced by statistical machine translation (SMT). However, NMT has a very high computational cost because of the high dimensionality of the output layer. Generally, NMT restricts the size of vocabulary, which results in infrequent words being treated as out-of-vocabulary (OOV) and degrades the performance of the translation. In evaluation, we achieved a statistically significant BLEU score improvement of 0.55-0.77 over the baselines including the state-of-the-art method. inproceedings sekizawa-kajiwara-komachi:2017:WAT2017 Improving Low-Resource Neural Machine Translation with Filtered Pseudo-Parallel Corpus AizhanImankulova TakayukiSato MamoruKomachi Proceedings of the 4th Workshop on Asian Translation (WAT2017) November 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 70–78 http://www.aclweb.org/anthology/W17-5704 Large-scale parallel corpora are indispensable to train highly accurate machine translators. However, manually constructed large-scale parallel corpora are not freely available in many language pairs. In previous studies, training data have been expanded using a pseudo-parallel corpus obtained using machine translation of the monolingual corpus in the target language. However, in low-resource language pairs in which only low-accuracy machine translation systems can be used, translation quality is reduces when a pseudo-parallel corpus is used naively. To improve machine translation performance with low-resource language pairs, we propose a method to expand the training data effectively via filtering the pseudo-parallel corpus using a quality estimation based on back-translation. As a result of experiments with three language pairs using small, medium, and large parallel corpora, language pairs with fewer training data filtered out more sentence pairs and improved BLEU scores more significantly. inproceedings imankulova-sato-komachi:2017:WAT2017 Japanese to English/Chinese/Korean Datasets for Translation Quality Estimation and Automatic Post-Editing AtsushiFujita EiichiroSumita Proceedings of the 4th Workshop on Asian Translation (WAT2017) November 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 79–88 http://www.aclweb.org/anthology/W17-5705 Aiming at facilitating the research on quality estimation (QE) and automatic post-editing (APE) of machine translation (MT) outputs, especially for those among Asian languages, we have created new datasets for Japanese to English, Chinese, and Korean translations. As the source text, actual utterances in Japanese were extracted from the log data of our speech translation service. MT outputs were then given by phrase-based statistical MT systems. Finally, human evaluators were employed to grade the quality of MT outputs and to post-edit them. This paper describes the characteristics of the created datasets and reports on our benchmarking experiments on word-level QE, sentence-level QE, and APE conducted using the created datasets. inproceedings fujita-sumita:2017:WAT2017 NTT Neural Machine Translation Systems at WAT 2017 MakotoMorishita JunSuzuki MasaakiNagata Proceedings of the 4th Workshop on Asian Translation (WAT2017) November 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 89–94 http://www.aclweb.org/anthology/W17-5706 In this year, we participated in four translation subtasks at WAT 2017. Our model structure is quite simple but we used it with well-tuned hyper-parameters, leading to a significant improvement compared to the previous state-of-the-art system. We also tried to make use of the unreliable part of the provided parallel corpus by back-translating and making a synthetic corpus. Our submitted system achieved the new state-of-the-art performance in terms of the BLEU score, as well as human evaluation. inproceedings morishita-suzuki-nagata:2017:WAT2017 XMU Neural Machine Translation Systems for WAT 2017 BoliWang ZhixingTan JinmingHu YidongChen xiaodongshi Proceedings of the 4th Workshop on Asian Translation (WAT2017) November 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 95–98 http://www.aclweb.org/anthology/W17-5707 This paper describes the Neural Machine Translation systems of Xiamen University for the shared translation tasks of WAT 2017. Our systems are based on the Encoder-Decoder framework with attention. We participated in three subtasks. We experimented subword segmentation, synthetic training data and model ensembling. Experiments show that all these methods can give substantial improvements. inproceedings wang-EtAl:2017:WAT2017 A Bag of Useful Tricks for Practical Neural Machine Translation: Embedding Layer Initialization and Large Batch Size MasatoNeishi JinSakuma SatoshiTohda ShonosukeIshiwatari NaokiYoshinaga MasashiToyoda Proceedings of the 4th Workshop on Asian Translation (WAT2017) November 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 99–109 http://www.aclweb.org/anthology/W17-5708 In this paper, we describe the team UT-IIS's system and results for the WAT 2017 translation tasks. We further investigated several tricks including a novel technique for initializing embedding layers using only the parallel corpus, which increased the BLEU score by 1.28, found a practical large batch size of 256, and gained insights regarding hyperparameter settings. Ultimately, our system obtained a better result than the state-of-the-art system of WAT 2016. Our code is available on https://github.com/nem6ishi/wat17. inproceedings neishi-EtAl:2017:WAT2017 Patent NMT integrated with Large Vocabulary Phrase Translation by SMT at WAT 2017 ZiLong RyuichiroKimura TakehitoUtsuro TomoharuMitsuhashi MikioYamamoto Proceedings of the 4th Workshop on Asian Translation (WAT2017) November 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 110–118 http://www.aclweb.org/anthology/W17-5709 Neural machine translation (NMT) cannot handle a larger vocabulary because the training complexity and decoding complexity proportionally increase with the number of target words. This problem becomes even more serious when translating patent documents, which contain many technical terms that are observed infrequently. Long et al.(2017) proposed to select phrases that contain out-of-vocabulary words using the statistical approach of branching entropy. The selected phrases are then replaced with tokens during training and post-translated by the phrase translation table of SMT. In this paper, we apply the method proposed by Long et al. (2017) to the WAT 2017 Japanese-Chinese and Japanese-English patent datasets. Evaluation on Japanese-to-Chinese, Chinese-to-Japanese, Japanese-to-English and English-to-Japanese patent sentence translation proved the effectiveness of phrases selected with branching entropy, where the NMT model of Long et al.(2017) achieves a substantial improvement over a baseline NMT model without the technique proposed by Long et al.(2017). inproceedings long-EtAl:2017:WAT2017 SMT reranked NMT TerumasaEhara Proceedings of the 4th Workshop on Asian Translation (WAT2017) November 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 119–126 http://www.aclweb.org/anthology/W17-5710 System architecture, experimental settings and experimental results of the EHR team for the WAT2017 tasks are described. We participate in three tasks: JPCen-ja, JPCzh-ja and JPCko-ja. Although the basic architecture of our system is NMT, reranking technique is conducted using SMT results. One of the major drawback of NMT is under-translation and over-translation. On the other hand, SMT infrequently makes such translations. So, using reranking of n-best NMT outputs by the SMT output, discarding such translations can be expected. We can improve BLEU score from 46.03 to 47.08 by this technique in JPCzh-ja task. inproceedings ehara:2017:WAT2017 Ensemble and Reranking: Using Multiple Models in the NICT-2 Neural Machine Translation System at WAT2017 KenjiImamura EiichiroSumita Proceedings of the 4th Workshop on Asian Translation (WAT2017) November 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 127–134 http://www.aclweb.org/anthology/W17-5711 In this paper, we describe the NICT-2 neural machine translation system evaluated at WAT2017. This system uses multiple models as an ensemble and combines models with opposite decoding directions by reranking (called bi-directional reranking). In our experimental results on small data sets, the translation quality improved when the number of models was increased to 32 in total and did not saturate. In the experiments on large data sets, improvements of 1.59-3.32 BLEU points were achieved when six-model ensembles were combined by the bi-directional reranking. inproceedings imamura-sumita:2017:WAT2017 A Simple and Strong Baseline: NAIST-NICT Neural Machine Translation System for WAT2017 English-Japanese Translation Task YusukeOda KatsuhitoSudoh SatoshiNakamura MasaoUtiyama EiichiroSumita Proceedings of the 4th Workshop on Asian Translation (WAT2017) November 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 135–139 http://www.aclweb.org/anthology/W17-5712 This paper describes the details about the NAIST-NICT machine translation system for WAT2017 English-Japanese Scientific Paper Translation Task. The system consists of a language-independent tokenizer and an attentional encoder-decoder style neural machine translation model. According to the official results, our system achieves higher translation accuracy than any systems submitted previous campaigns despite simple model architecture. inproceedings oda-EtAl:2017:WAT2017 Comparison of SMT and NMT trained with large Patent Corpora: Japio at WAT2017 SatoshiKinoshita TadaakiOshio TomoharuMitsuhashi Proceedings of the 4th Workshop on Asian Translation (WAT2017) November 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 140–145 http://www.aclweb.org/anthology/W17-5713 Japio participates in patent subtasks (JPC-EJ/JE/CJ/KJ) with phrase-based statistical machine translation (SMT) and neural machine translation (NMT) systems which are trained with its own patent corpora in addition to the subtask corpora provided by organizers of WAT2017. In EJ and CJ subtasks, SMT and NMT systems whose sizes of training corpora are about 50 million and 10 million sentence pairs respectively achieved comparable scores for automatic evaluations, but NMT systems were superior to SMT systems for both official and in-house human evaluations. inproceedings kinoshita-oshio-mitsuhashi:2017:WAT2017 Kyoto University Participation to WAT 2017 FabienCromieres RajDabre ToshiakiNakazawa SadaoKurohashi Proceedings of the 4th Workshop on Asian Translation (WAT2017) November 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 146–153 http://www.aclweb.org/anthology/W17-5714 We describe here our approaches and results on the WAT 2017 shared translation tasks. Following our good results with Neural Machine Translation in the previous shared task, we continue this approach this year, with incremental improvements in models and training methods. We focused on the ASPEC dataset and could improve the state-of-the-art results for Chinese-to-Japanese and Japanese-to-Chinese translations. inproceedings cromieres-EtAl:2017:WAT2017 CUNI NMT System for WAT 2017 Translation Tasks TomKocmi DušanVariš OndřejBojar Proceedings of the 4th Workshop on Asian Translation (WAT2017) November 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 154–159 http://www.aclweb.org/anthology/W17-5715 The paper presents this year's CUNI submissions to the WAT 2017 Translation Task focusing on the Japanese-English translation, namely Scientific papers subtask, Patents subtask and Newswire subtask. We compare two neural network architectures, the standard sequence-to-sequence with attention (Seq2Seq) and an architecture using convolutional sentence encoder (FBConv2Seq), both implemented in the NMT framework Neural Monkey that we currently participate in developing. We also compare various types of preprocessing of the source Japanese sentences and their impact on the overall results. Furthermore, we include the results of our experiments with out-of-domain data obtained by combining the corpora provided for each subtask. inproceedings kocmi-varivs-bojar:2017:WAT2017 Tokyo Metropolitan University Neural Machine Translation System for WAT 2017 YukioMatsumura MamoruKomachi Proceedings of the 4th Workshop on Asian Translation (WAT2017) November 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 160–166 http://www.aclweb.org/anthology/W17-5716 In this paper, we describe our neural machine translation (NMT) system, which is based on the attention-based NMT and uses long short-term memories (LSTM) as RNN. We implemented beam search and ensemble decoding in the NMT system. The system was tested on the 4th Workshop on Asian Translation (WAT 2017) shared tasks. In our experiments, we participated in the scientific paper subtasks and attempted Japanese-English, English-Japanese, and Japanese-Chinese translation tasks. The experimental results showed that implementation of beam search and ensemble decoding can effectively improve the translation quality. inproceedings matsumura-komachi:2017:WAT2017 Comparing Recurrent and Convolutional Architectures for English-Hindi Neural Machine Translation SandhyaSingh RiteshPanjwani AnoopKunchukuttan PushpakBhattacharyya Proceedings of the 4th Workshop on Asian Translation (WAT2017) November 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 167–170 http://www.aclweb.org/anthology/W17-5717 In this paper, we empirically compare the two encoder-decoder neural machine translation architectures: convolutional se- quence to sequence model (ConvS2S) and recurrent sequence to sequence model (RNNS2S) for English-Hindi language pair as part of IIT Bombay’s submission to WAT2017 shared task. We report the results for both English-Hindi and Hindi- English direction of language pair. inproceedings singh-EtAl:2017:WAT2017