Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

Proceedings of the 3rd Workshop on Asian Translation (WAT2016) Toshiaki Nakazawa Hideya Mino Chenchen Ding Isao Goto Graham Neubig Sadao Kurohashi Ir. Hammam Riza Pushpak Bhattacharyya December 2016

Osaka, Japan

The COLING 2016 Organizing Committee http://aclweb.org/anthology/W16-46 book WAT2016:2016 Overview of the 3rd Workshop on Asian Translation ToshiakiNakazawa ChenchenDing HideyaMINO IsaoGoto GrahamNeubig SadaoKurohashi Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 1–46 http://aclweb.org/anthology/W16-4601 This paper presents the results of the shared tasks from the 3rd workshop on Asian translation (WAT2016) including J <-> E, J <-> C scientific paper translation subtasks, C <-> J, K <-> J, E <-> J patent translation subtasks, I <-> E newswire subtasks and H <-> E, H <-> J mixed domain subtasks. For the WAT2016, 15 institutions participated in the shared tasks. About 500 translation results have been submitted to the automatic evaluation server, and selected submissions were manually evaluated. inproceedings nakazawa-EtAl:2016:WAT2016 Translation of Patent Sentences with a Large Vocabulary of Technical Terms Using Neural Machine Translation ZiLong TakehitoUtsuro TomoharuMitsuhashi MikioYamamoto Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 47–57 http://aclweb.org/anthology/W16-4602 Neural machine translation (NMT), a new approach to machine translation, has achieved promising results comparable to those of traditional approaches such as statistical machine translation (SMT). Despite its recent success, NMT cannot handle a larger vocabulary because training complexity and decoding complexity proportionally increase with the number of target words. This problem becomes even more serious when translating patent documents, which contain many technical terms that are observed infrequently. In NMTs, words that are out of vocabulary are represented by a single unknown token. In this paper, we propose a method that enables NMT to translate patent sentences comprising a large vocabulary of technical terms. We train an NMT system on bilingual data wherein technical terms are replaced with technical term tokens; this allows it to translate most of the source sentences except technical terms. Further, we use it as a decoder to translate source sentences with technical term tokens and replace the tokens with technical term translations using SMT. We also use it to rerank the 1,000-best SMT translations on the basis of the average of the SMT score and that of the NMT rescoring of the translated sentences with technical term tokens. Our experiments on Japanese-Chinese patent sentences show that the proposed NMT system achieves a substantial improvement of up to 3.1 BLEU points and 2.3 RIBES points over traditional SMT systems and an improvement of approximately 0.6 BLEU points and 0.8 RIBES points over an equivalent NMT system without our proposed technique. inproceedings long-EtAl:2016:WAT2016 Japanese-English Machine Translation of Recipe Texts TakayukiSato JunHarashima MamoruKomachi Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 58–67 http://aclweb.org/anthology/W16-4603 Concomitant with the globalization of food culture, demand for the recipes of specialty dishes has been increasing. The recent growth in recipe sharing websites and food blogs has resulted in numerous recipe texts being available for diverse foods in various languages. However, little work has been done on machine translation of recipe texts. In this paper, we address the task of translating recipes and investigate the advantages and disadvantages of traditional phrase- based statistical machine translation and more recent neural machine translation. Specifically, we translate Japanese recipes into English, analyze errors in the translated recipes, and discuss available room for improvements. inproceedings sato-harashima-komachi:2016:WAT2016 IIT Bombay’s English-Indonesian submission at WAT: Integrating Neural Language Models with SMT SandhyaSingh AnoopKunchukuttan PushpakBhattacharyya Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 68–74 http://aclweb.org/anthology/W16-4604 This paper describes the IIT Bombay’s submission as a part of the shared task in WAT 2016 for English--Indonesian language pair. The results reported here are for both the direction of the language pair. Among the various approaches experimented, Operation Sequence Model (OSM) and Neural Language Model have been submitted for WAT. The OSM approach integrates translation and reordering process resulting in relatively improved translation. Similarly the neural experiment integrates Neural Language Model with Statistical Machine Translation (SMT) as a feature for translation. The Neural Probabilistic Language Model (NPLM) gave relatively high BLEU points for Indonesian to English translation system while the Neural Network Joint Model (NNJM) performed better for English to Indonesian direction of translation system. The results indicate improvement over the baseline Phrase-based SMT by 0.61 BLEU points for English-Indonesian system and 0.55 BLEU points for Indonesian-English translation system. inproceedings singh-kunchukuttan-bhattacharyya:2016:WAT2016 Domain Adaptation and Attention-Based Unknown Word Replacement in Chinese-to-Japanese Neural Machine Translation KazumaHashimoto AkikoEriguchi YoshimasaTsuruoka Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 75–83 http://aclweb.org/anthology/W16-4605 This paper describes our UT-KAY system that participated in the Workshop on Asian Translation 2016. Based on an Attention-based Neural Machine Translation (ANMT) model, we build our system by incorporating a domain adaptation method for multiple domains and an attention-based unknown word replacement method. In experiments, we verify that the attention-based unknown word replacement method is effective in improving translation scores in Chinese-to-Japanese machine translation. We further show results of manual analysis on the replaced unknown words. inproceedings hashimoto-eriguchi-tsuruoka:2016:WAT2016 Global Pre-ordering for Improving Sublanguage Translation MasaruFuji MasaoUtiyama EiichiroSumita YujiMatsumoto Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 84–93 http://aclweb.org/anthology/W16-4606 When translating formal documents, capturing the sentence structure specific to the sublanguage is extremely necessary to obtain high-quality translations. This paper proposes a novel global reordering method with particular focus on long-distance reordering for capturing the global sentence structure of a sublanguage. The proposed method learns global reordering models from a non-annotated parallel corpus and works in conjunction with conventional syntactic reordering. Experimental results on the patent abstract sublanguage show substantial gains of more than 25 points in the RIBES metric and comparable BLEU scores both for Japanese-to-English and English-to-Japanese translations. inproceedings fuji-EtAl:2016:WAT2016 Neural Reordering Model Considering Phrase Translation and Word Alignment for Phrase-based Translation ShinKanouchi KatsuhitoSudoh MamoruKomachi Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 94–103 http://aclweb.org/anthology/W16-4607 This paper presents an improved lexicalized reordering model for phrase-based statistical machine translation using a deep neural network. Lexicalized reordering suffers from reordering ambiguity, data sparseness and noises in a phrase table. Previous neural reordering model is successful to solve the first and second problems but fails to address the third one. Therefore, we propose new features using phrase translation and word alignment to construct phrase vectors to handle inherently noisy phrase translation pairs. The experimental results show that our proposed method improves the accuracy of phrase reordering. We confirm that the proposed method works well with phrase pairs including NULL alignments. inproceedings kanouchi-sudoh-komachi:2016:WAT2016 System Description of bjtu_nlp Neural Machine Translation System ShaotongLi JinAnXu YufengChen YujieZhang Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 104–110 http://aclweb.org/anthology/W16-4608 This paper presents our machine translation system that developed for the WAT2016 evalua-tion tasks of ja-en, ja-zh, en-ja, zh-ja, JPCja-en, JPCja-zh, JPCen-ja, JPCzh-ja. We build our system based on encoder--decoder framework by integrating recurrent neural network (RNN) and gate recurrent unit (GRU), and we also adopt an attention mechanism for solving the problem of information loss. Additionally, we propose a simple translation-specific approach to resolve the unknown word translation problem. Experimental results show that our system performs better than the baseline statistical machine translation (SMT) systems in each task. Moreover, it shows that our proposed approach of unknown word translation performs effec-tively improvement of translation results. inproceedings li-EtAl:2016:WAT2016 Translation systems and experimental results of the EHR group for WAT2016 tasks TerumasaEhara Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 111–118 http://aclweb.org/anthology/W16-4609 System architecture, experimental settings and experimental results of the group for the WAT2016 tasks are described. We participate in six tasks: en-ja, zh-ja, JPCzh-ja, JPCko-ja, HINDENen-hi and HINDENhi-ja. Although the basic architecture of our sys-tems is PBSMT with reordering, several techniques are conducted. Especially, the system for the HINDENhi-ja task with pivoting by English uses the reordering technique. Be-cause Hindi and Japanese are both OV type languages and English is a VO type language, we can use reordering technique to the pivot language. We can improve BLEU score from 7.47 to 7.66 by the reordering technique for the sentence level pivoting of this task. inproceedings ehara:2016:WAT2016 Lexicons and Minimum Risk Training for Neural Machine Translation: NAIST-CMU at WAT2016 GrahamNeubig Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 119–125 http://aclweb.org/anthology/W16-4610 This year, the Nara Institute of Science and Technology (NAIST)/Carnegie Mellon University (CMU) submission to the Japanese-English translation track of the 2016 Workshop on Asian Translation was based on attentional neural machine translation (NMT) models. In addition to the standard NMT model, we make a number of improvements, most notably the use of discrete translation lexicons to improve probability estimates, and the use of minimum risk training to optimize the MT system for BLEU score. As a result, our system achieved the highest translation evaluation scores for the task. inproceedings neubig:2016:WAT2016 NICT-2 Translation System for WAT2016: Applying Domain Adaptation to Phrase-based Statistical Machine Translation KenjiImamura EiichiroSumita Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 126–132 http://aclweb.org/anthology/W16-4611 This paper describes the NICT-2 translation system for the 3rd Workshop on Asian Translation. The proposed system employs a domain adaptation method based on feature augmentation. We regarded the Japan Patent Office Corpus as a mixture of four domain corpora and improved the translation quality of each domain. In addition, we incorporated language models constructed from Google n-grams as external knowledge. Our domain adaptation method can naturally incorporate such external knowledge that contributes to translation quality. inproceedings imamura-sumita:2016:WAT2016 Translation Using JAPIO Patent Corpora: JAPIO at WAT2016 SatoshiKinoshita TadaakiOshio TomoharuMitsuhashi TerumasaEhara Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 133–138 http://aclweb.org/anthology/W16-4612 We participate in scientific paper subtask (ASPEC-EJ/CJ) and patent subtask (JPC-EJ/CJ/KJ) with phrase-based SMT systems which are trained with its own patent corpora. Using larger corpora than those prepared by the workshop organizer, we achieved higher BLEU scores than most participants in EJ and CJ translations of patent subtask, but in crowdsourcing evaluation, our EJ translation, which is best in all automatic evaluations, received a very poor score. In scientific paper subtask, our translations are given lower scores than most translations that are produced by translation engines trained with the in-domain corpora. But our scores are higher than those of general-purpose RBMTs and online services. Considering the result of crowdsourcing evaluation, it shows a possibility that CJ SMT system trained with a large patent corpus translates non-patent technical documents at a practical level. inproceedings kinoshita-EtAl:2016:WAT2016 An Efficient and Effective Online Sentence Segmenter for Simultaneous Interpretation XiaolinWang AndrewFinch MasaoUtiyama EiichiroSumita Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 139–148 http://aclweb.org/anthology/W16-4613 Simultaneous interpretation is a very challenging application of machine translation in which the input is a stream of words from a speech recognition engine. The key problem is how to segment the stream in an online manner into units suitable for translation. The segmentation process proceeds by calculating a confidence score for each word that indicates the soundness of placing a sentence boundary after it, and then heuristics are employed to determine the position of the boundaries. Multiple variants of the confidence scoring method and segmentation heuristics were studied. Experimental results show that the best performing strategy is not only efficient in terms of average latency per word, but also achieved end-to-end translation quality close to an offline baseline, and close to oracle segmentation. inproceedings wang-EtAl:2016:WAT2016 Similar Southeast Asian Languages: Corpus-Based Case Study on Thai-Laotian and Malay-Indonesian ChenchenDing MasaoUtiyama EiichiroSumita Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 149–156 http://aclweb.org/anthology/W16-4614 This paper illustrates the similarity between Thai and Laotian, and between Malay and Indonesian, based on an investigation on raw parallel data from Asian Language Treebank. The cross-lingual similarity is investigated and demonstrated on metrics of correspondence and order of tokens, based on several standard statistical machine translation techniques. The similarity shown in this study suggests a possibility on harmonious annotation and processing of the language pairs in future development. inproceedings ding-utiyama-sumita:2016:WAT2016 Integrating empty category detection into preordering Machine Translation ShunsukeTakeno MasaakiNagata KazuhideYamamoto Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 157–165 http://aclweb.org/anthology/W16-4615 We propose a method for integrating Japanese empty category detection into the preordering process of Japanese-to-English statistical machine translation. First, we apply machine-learning-based empty category detection to estimate the position and the type of empty categories in the constituent tree of the source sentence. Then, we apply discriminative preordering to the augmented constituent tree in which empty categories are treated as if they are normal lexical symbols. We find that it is effective to filter empty categories based on the confidence of estimation. Our experiments show that, for the IWSLT dataset consisting of short travel conversations, the insertion of empty categories alone improves the BLEU score from 33.2 to 34.3 and the RIBES score from 76.3 to 78.7, which imply that reordering has improved For the KFTT dataset consisting of Wikipedia sentences, the proposed preordering method considering empty categories improves the BLEU score from 19.9 to 20.2 and the RIBES score from 66.2 to 66.3, which shows both translation and reordering have improved slightly. inproceedings takeno-nagata-yamamoto:2016:WAT2016 Kyoto University Participation to WAT 2016 FabienCromieres ChenhuiChu ToshiakiNakazawa SadaoKurohashi Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 166–174 http://aclweb.org/anthology/W16-4616 We describe here our approaches and results on the WAT 2016 shared translation tasks. We tried to use both an example-based machine translation (MT) system and a neural MT system. We report very good translation results, especially when using neural MT for Chinese-to-Japanese translation. inproceedings cromieres-EtAl:2016:WAT2016 Character-based Decoding in Tree-to-Sequence Attention-based Neural Machine Translation AkikoEriguchi KazumaHashimoto YoshimasaTsuruoka Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 175–183 http://aclweb.org/anthology/W16-4617 This paper reports our systems (UT-AKY) submitted in the 3rd Workshop of Asian Translation 2016 (WAT'16) and their results in the English-to-Japanese translation task. Our model is based on the tree-to-sequence Attention-based NMT (ANMT) model proposed by Eriguchi et al. (2016). We submitted two ANMT systems: one with a word-based decoder and the other with a character-based decoder. Experimenting on the English-to-Japanese translation task, we have confirmed that the character-based decoder can cover almost the full vocabulary in the target language and generate translations much faster than the word-based model. inproceedings eriguchi-hashimoto-tsuruoka:2016:WAT2016 Faster and Lighter Phrase-based Machine Translation Baseline LilingTan Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 184–193 http://aclweb.org/anthology/W16-4618 This paper describes the SENSE machine translation system participation in the Third Workshop for Asian Translation (WAT2016). We share our best practices to build a fast and light phrase-based machine translation (PBMT) models that have comparable results to the baseline systems provided by the organizers. As Neural Machine Translation (NMT) overtakes PBMT as the state-of-the-art, deep learning and new MT practitioners might not be familiar with the PBMT paradigm and we hope that this paper will help them build a PBMT baseline system quickly and easily. inproceedings tan:2016:WAT2016 Improving Patent Translation using Bilingual Term Extraction and Re-tokenization for Chinese–Japanese WeiYang YvesLepage Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 194–202 http://aclweb.org/anthology/W16-4619 Unlike European languages, many Asian languages like Chinese and Japanese do not have typographic boundaries in written system. Word segmentation (tokenization) that break sentences down into individual words (tokens) is normally treated as the first step for machine translation (MT). For Chinese and Japanese, different rules and segmentation tools lead different segmentation results in different level of granularity between Chinese and Japanese. To improve the translation accuracy, we adjust and balance the granularity of segmentation results around terms for Chinese--Japanese patent corpus for training translation model. In this paper, we describe a statistical machine translation (SMT) system which is built on re-tokenized Chinese--Japanese patent training corpus using extracted bilingual multi-word terms. inproceedings yang-lepage:2016:WAT2016 Controlling the Voice of a Sentence in Japanese-to-English Neural Machine Translation HayahideYamagishi ShinKanouchi TakayukiSato MamoruKomachi Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 203–210 http://aclweb.org/anthology/W16-4620 In machine translation, we must consider the difference in expression between languages. For example, the active/passive voice may change in Japanese-English translation. The same verb in Japanese may be translated into different voices at each translation because the voice of a generated sentence cannot be determined using only the information of the Japanese sentence. Machine translation systems should consider the information structure to improve the coherence of the output by using several topicalization techniques such as passivization. Therefore, this paper reports on our attempt to control the voice of the sentence generated by an encoder-decoder model. To control the voice of the generated sentence, we added the voice information of the target sentence to the source sentence during the training. We then generated sentences with a specified voice by appending the voice information to the source sentence. We observed experimentally whether the voice could be controlled. The results showed that, we could control the voice of the generated sentence with 85.0% accuracy on average. In the evaluation of Japanese-English translation, we obtained a 0.73-point improvement in BLEU score by using gold voice labels. inproceedings yamagishi-EtAl:2016:WAT2016 Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering for WAT 2016 KatsuhitoSudoh MasaakiNagata Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 211–215 http://aclweb.org/anthology/W16-4621 This paper presents our Chinese-to-Japanese patent machine translation system for WAT 2016 (Group ID: ntt) that uses syntactic pre-ordering over Chinese dependency structures. Chinese words are reordered by a learning-to-rank model based on pairwise classification to obtain word order close to Japanese. In this year’s system, two different machine translation methods are compared: traditional phrase-based statistical machine translation and recent sequence-to-sequence neural machine translation with an attention mechanism. Our pre-ordering showed a significant improvement over the phrase-based baseline, but, in contrast, it degraded the neural machine translation baseline. inproceedings sudoh-nagata:2016:WAT2016 IITP English-Hindi Machine Translation System at WAT 2016 SukantaSen DebajyotyBanik AsifEkbal PushpakBhattacharyya Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 216–222 http://aclweb.org/anthology/W16-4622 In this paper we describe the system that we develop as part of our participation in WAT 2016. We develop a system based on hierarchical phrase-based SMT for English to Hindi language pair. We perform re-ordering and augment bilingual dictionary to improve the performance. As a baseline we use a phrase-based SMT model. The MT models are fine-tuned on the development set, and the best configurations are used to report the evaluation on the test set. Experiments show the BLEU of 13.71 on the benchmark test data. This is better compared to the official baseline BLEU score of 10.79. inproceedings sen-EtAl:2016:WAT2016 Residual Stacking of RNNs for Neural Machine Translation RaphaelShu AkivaMiura Proceedings of the 3rd Workshop on Asian Translation (WAT2016) December 2016

Osaka, Japan

The COLING 2016 Organizing Committee 223–229 http://aclweb.org/anthology/W16-4623 To enhance Neural Machine Translation models, several obvious ways such as enlarging the hidden size of recurrent layers and stacking multiple layers of RNN can be considered. Surprisingly, we observe that using naively stacked RNNs in the decoder slows down the training and leads to degradation in performance. In this paper, We demonstrate that applying residual connections in the depth of stacked RNNs can help the optimization, which is referred to as residual stacking. In empirical evaluation, residual stacking of decoder RNNs gives superior results compared to other methods of enhancing the model with a fixed parameter budget. Our submitted systems in WAT2016 are based on a NMT model ensemble with residual stacking in the decoder. To further improve the performance, we also attempt various methods of system combination in our experiments. inproceedings shu-miura:2016:WAT2016