Kenton Murray


2021

pdf bib
Everything Is All It Takes: A Multipronged Strategy for Zero-Shot Cross-Lingual Information Extraction
Mahsa Yarmohammadi | Shijie Wu | Marc Marone | Haoran Xu | Seth Ebner | Guanghui Qin | Yunmo Chen | Jialiang Guo | Craig Harman | Kenton Murray | Aaron Steven White | Mark Dredze | Benjamin Van Durme
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Zero-shot cross-lingual information extraction (IE) describes the construction of an IE model for some target language, given existing annotations exclusively in some other language, typically English. While the advance of pretrained multilingual encoders suggests an easy optimism of “train on English, run on any language”, we find through a thorough exploration and extension of techniques that a combination of approaches, both new and old, leads to better performance than any one cross-lingual strategy in particular. We explore techniques including data projection and self-training, and how different pretrained encoders impact them. We use English-to-Arabic IE as our initial example, demonstrating strong performance in this setting for event extraction, named entity recognition, part-of-speech tagging, and dependency parsing. We then apply data projection and self-training to three tasks across eight target languages. Because no single set of techniques performs the best across all tasks, we encourage practitioners to explore various configurations of the techniques described in this work when seeking to improve on zero-shot training.

pdf bib
BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation
Haoran Xu | Benjamin Van Durme | Kenton Murray
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

The success of bidirectional encoders using masked language models, such as BERT, on numerous natural language processing tasks has prompted researchers to attempt to incorporate these pre-trained models into neural machine translation (NMT) systems. However, proposed methods for incorporating pre-trained models are non-trivial and mainly focus on BERT, which lacks a comparison of the impact that other pre-trained models may have on translation performance. In this paper, we demonstrate that simply using the output (contextualized embeddings) of a tailored and suitable bilingual pre-trained language model (dubbed BiBERT) as the input of the NMT encoder achieves state-of-the-art translation performance. Moreover, we also propose a stochastic layer selection approach and a concept of a dual-directional translation model to ensure the sufficient utilization of contextualized embeddings. In the case of without using back translation, our best models achieve BLEU scores of 30.45 for En→De and 38.61 for De→En on the IWSLT’14 dataset, and 31.26 for En→De and 34.94 for De→En on the WMT’14 dataset, which exceeds all published numbers.

pdf bib
Gradual Fine-Tuning for Low-Resource Domain Adaptation
Haoran Xu | Seth Ebner | Mahsa Yarmohammadi | Aaron Steven White | Benjamin Van Durme | Kenton Murray
Proceedings of the Second Workshop on Domain Adaptation for NLP

Fine-tuning is known to improve NLP models by adapting an initial model trained on more plentiful but less domain-salient examples to data in a target domain. Such domain adaptation is typically done using one stage of fine-tuning. We demonstrate that gradually fine-tuning in a multi-step process can yield substantial further gains and can be applied without modifying the model or learning objective.

pdf bib
Data Augmentation by Concatenation for Low-Resource Translation: A Mystery and a Solution
Toan Q. Nguyen | Kenton Murray | David Chiang
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

In this paper, we investigate the driving factors behind concatenation, a simple but effective data augmentation method for low-resource neural machine translation. Our experiments suggest that discourse context is unlikely the cause for concatenation improving BLEU by about +1 across four language pairs. Instead, we demonstrate that the improvement comes from three other factors unrelated to discourse: context diversity, length diversity, and (to a lesser extent) position shifting.

2020

pdf bib
Collecting Verified COVID-19 Question Answer Pairs
Adam Poliak | Max Fleming | Cash Costello | Kenton Murray | Mahsa Yarmohammadi | Shivani Pandya | Darius Irani | Milind Agarwal | Udit Sharma | Shuo Sun | Nicola Ivanov | Lingxi Shang | Kaushik Srinivasan | Seolhwa Lee | Xu Han | Smisha Agarwal | João Sedoc
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

We release a dataset of over 2,100 COVID19 related Frequently asked Question-Answer pairs scraped from over 40 trusted websites. We include an additional 24, 000 questions pulled from online sources that have been aligned by experts with existing answered questions from our dataset. This paper describes our efforts in collecting the dataset and summarizes the resulting data. Our dataset is automatically updated daily and available at https://github.com/JHU-COVID-QA/ scraping-qas. So far, this data has been used to develop a chatbot providing users information about COVID-19. We encourage others to build analytics and tools upon this dataset as well.

pdf bib
The JHU Submission to the 2020 Duolingo Shared Task on Simultaneous Translation and Paraphrase for Language Education
Huda Khayrallah | Jacob Bremerman | Arya D. McCarthy | Kenton Murray | Winston Wu | Matt Post
Proceedings of the Fourth Workshop on Neural Generation and Translation

This paper presents the Johns Hopkins University submission to the 2020 Duolingo Shared Task on Simultaneous Translation and Paraphrase for Language Education (STAPLE). We participated in all five language tasks, placing first in each. Our approach involved a language-agnostic pipeline of three components: (1) building strong machine translation systems on general-domain data, (2) fine-tuning on Duolingo-provided data, and (3) generating n-best lists which are then filtered with various score-based techniques. In addi- tion to the language-agnostic pipeline, we attempted a number of linguistically-motivated approaches, with, unfortunately, little success. We also find that improving BLEU performance of the beam-search generated translation does not necessarily improve on the task metric—weighted macro F1 of an n-best list.

2019

pdf bib
Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation
Kenton Murray | Jeffery Kinnison | Toan Q. Nguyen | Walter Scheirer | David Chiang
Proceedings of the 3rd Workshop on Neural Generation and Translation

Neural sequence-to-sequence models, particularly the Transformer, are the state of the art in machine translation. Yet these neural networks are very sensitive to architecture and hyperparameter settings. Optimizing these settings by grid or random search is computationally expensive because it requires many training runs. In this paper, we incorporate architecture search into a single training run through auto-sizing, which uses regularization to delete neurons in a network over the course of training. On very low-resource language pairs, we show that auto-sizing can improve BLEU scores by up to 3.9 points while removing one-third of the parameters from the model.

pdf bib
Efficiency through Auto-Sizing: Notre Dame NLP’s Submission to the WNGT 2019 Efficiency Task
Kenton Murray | Brian DuSell | David Chiang
Proceedings of the 3rd Workshop on Neural Generation and Translation

This paper describes the Notre Dame Natural Language Processing Group’s (NDNLP) submission to the WNGT 2019 shared task (Hayashi et al., 2019). We investigated the impact of auto-sizing (Murray and Chiang, 2015; Murray et al., 2019) to the Transformer network (Vaswani et al., 2017) with the goal of substantially reducing the number of parameters in the model. Our method was able to eliminate more than 25% of the model’s parameters while suffering a decrease of only 1.1 BLEU.

2018

pdf bib
Correcting Length Bias in Neural Machine Translation
Kenton Murray | David Chiang
Proceedings of the Third Conference on Machine Translation: Research Papers

We study two problems in neural machine translation (NMT). First, in beam search, whereas a wider beam should in principle help translation, it often hurts NMT. Second, NMT has a tendency to produce translations that are too short. Here, we argue that these problems are closely related and both rooted in label bias. We show that correcting the brevity problem almost eliminates the beam problem; we compare some commonly-used methods for doing this, finding that a simple per-word reward works well; and we introduce a simple and quick way to tune this reward using the perceptron algorithm.

2015

pdf bib
Auto-Sizing Neural Networks: With Applications to n-gram Language Models
Kenton Murray | David Chiang
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2013

pdf bib
The CMU Machine Translation Systems at WMT 2013: Syntax, Synthetic Translation Options, and Pseudo-References
Waleed Ammar | Victor Chahuneau | Michael Denkowski | Greg Hanneman | Wang Ling | Austin Matthews | Kenton Murray | Nicola Segall | Alon Lavie | Chris Dyer
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
QCRI at IWSLT 2013: experiments in Arabic-English and English-Arabic spoken language translation
Hassan Sajjad | Francisco Guzmán | Preslav Nakov | Ahmed Abdelali | Kenton Murray | Fahad Al Obaidli | Stephan Vogel
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign

We describe the Arabic-English and English-Arabic statistical machine translation systems developed by the Qatar Computing Research Institute for the IWSLT’2013 evaluation campaign on spoken language translation. We used one phrase-based and two hierarchical decoders, exploring various settings thereof. We further experimented with three domain adaptation methods, and with various Arabic word segmentation schemes. Combining the output of several systems yielded a gain of up to 3.4 BLEU points over the baseline. Here we also describe a specialized normalization scheme for evaluating Arabic output, which was adopted for the IWSLT’2013 evaluation campaign.