Philip Arthur


pdf bib
ParaCotta: Synthetic Multilingual Paraphrase Corpora from the Most Diverse Translation Sample Pair
Alham Fikri Aji | Radityo Eko Prasojo Tirana Noor Fatyanosa | Philip Arthur | Suci Fitriany | Salma Qonitah | Nadhifa Zulfa | Tomi Santoso | Mahendra Data
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation

pdf bib
Multilingual Simultaneous Neural Machine Translation
Philip Arthur | Dongwon Ryu | Gholamreza Haffari
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Learning Coupled Policies for Simultaneous Machine Translation using Imitation Learning
Philip Arthur | Trevor Cohn | Gholamreza Haffari
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

We present a novel approach to efficiently learn a simultaneous translation model with coupled programmer-interpreter policies. First, we present an algorithmic oracle to produce oracle READ/WRITE actions for training bilingual sentence-pairs using the notion of word alignments. This oracle actions are designed to capture enough information from the partial input before writing the output. Next, we perform a coupled scheduled sampling to effectively mitigate the exposure bias when learning both policies jointly with imitation learning. Experiments on six language-pairs show our method outperforms strong baselines in terms of translation quality quality while keeping the delay low.

pdf bib
It Is Not As Good As You Think! Evaluating Simultaneous Machine Translation on Interpretation Data
Jinming Zhao | Philip Arthur | Gholamreza Haffari | Trevor Cohn | Ehsan Shareghi
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Most existing simultaneous machine translation (SiMT) systems are trained and evaluated on offline translation corpora. We argue that SiMT systems should be trained and tested on real interpretation data. To illustrate this argument, we propose an interpretation test set and conduct a realistic evaluation of SiMT trained on offline translations. Our results, on our test set along with 3 existing smaller scale language pairs, highlight the difference of up-to 13.83 BLEU score when SiMT models are evaluated on translation vs interpretation data. In the absence of interpretation training data, we propose a translation-to-interpretation (T2I) style transfer method which allows converting existing offline translations into interpretation-style data, leading to up-to 2.8 BLEU improvement. However, the evaluation gap remains notable, calling for constructing large-scale interpretation corpora better suited for evaluating and developing SiMT systems.


pdf bib
XNMT: The eXtensible Neural Machine Translation Toolkit
Graham Neubig | Matthias Sperber | Xinyi Wang | Matthieu Felix | Austin Matthews | Sarguna Padmanabhan | Ye Qi | Devendra Sachan | Philip Arthur | Pierre Godard | John Hewitt | Rachid Riad | Liming Wang
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)


pdf bib
Neural Machine Translation via Binary Code Prediction
Yusuke Oda | Philip Arthur | Graham Neubig | Koichiro Yoshino | Satoshi Nakamura
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we propose a new method for calculating the output layer in neural machine translation systems. The method is based on predicting a binary code for each word and can reduce computation time/memory requirements of the output layer to be logarithmic in vocabulary size in the best case. In addition, we also introduce two advanced approaches to improve the robustness of the proposed model: using error-correcting codes and combining softmax and binary codes. Experiments on two English-Japanese bidirectional translation tasks show proposed models achieve BLEU scores that approach the softmax, while reducing memory usage to the order of less than 1/10 and improving decoding speed on CPUs by x5 to x10.


pdf bib
Incorporating Discrete Translation Lexicons into Neural Machine Translation
Philip Arthur | Graham Neubig | Satoshi Nakamura
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing


pdf bib
Semantic Parsing of Ambiguous Input through Paraphrasing and Verification
Philip Arthur | Graham Neubig | Sakriani Sakti | Tomoki Toda | Satoshi Nakamura
Transactions of the Association for Computational Linguistics, Volume 3

We propose a new method for semantic parsing of ambiguous and ungrammatical input, such as search queries. We do so by building on an existing semantic parsing framework that uses synchronous context free grammars (SCFG) to jointly model the input sentence and output meaning representation. We generalize this SCFG framework to allow not one, but multiple outputs. Using this formalism, we construct a grammar that takes an ambiguous input string and jointly maps it into both a meaning representation and a natural language paraphrase that is less ambiguous than the original input. This paraphrase can be used to disambiguate the meaning representation via verification using a language model that calculates the probability of each paraphrase.

pdf bib
Multi-Target Machine Translation with Multi-Synchronous Context-free Grammars
Graham Neubig | Philip Arthur | Kevin Duh
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies