Ke M. Tran

Also published as: Ke Tran, Ke Tran Manh


2022

pdf bib
The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation
Tobias Domhan | Eva Hasler | Ke Tran | Sony Trenous | Bill Byrne | Felix Hieber
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Vocabulary selection, or lexical shortlisting, is a well-known technique to improve latency of Neural Machine Translation models by constraining the set of allowed output words during inference. The chosen set is typically determined by separately trained alignment model parameters, independent of the source-sentence context at inference time. While vocabulary selection appears competitive with respect to automatic quality metrics in prior work, we show that it can fail to select the right set of output words, particularly for semantically non-compositional linguistic phenomena such as idiomatic expressions, leading to reduced translation quality as perceived by humans. Trading off latency for quality by increasing the size of the allowed set is often not an option in real-world scenarios. We propose a model of vocabulary selection, integrated into the neural translation model, that predicts the set of allowed output words from contextualized encoder representations. This restores translation quality of an unconstrained system, as measured by human evaluations on WMT newstest2020 and idiomatic expressions, at an inference latency competitive with alignment-based selection using aggressive thresholds, thereby removing the dependency on separately trained alignment models.

2021

pdf bib
Improving the Quality Trade-Off for Neural Machine Translation Multi-Domain Adaptation
Eva Hasler | Tobias Domhan | Jonay Trenous | Ke Tran | Bill Byrne | Felix Hieber
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Building neural machine translation systems to perform well on a specific target domain is a well-studied problem. Optimizing system performance for multiple, diverse target domains however remains a challenge. We study this problem in an adaptation setting where the goal is to preserve the existing system quality while incorporating data for domains that were not the focus of the original translation system. We find that we can improve over the performance trade-off offered by Elastic Weight Consolidation with a relatively simple data mixing strategy. At comparable performance on the new domains, catastrophic forgetting is mitigated significantly on strong WMT baselines. Combining both approaches improves the Pareto frontier on this task.

2020

pdf bib
Generating Synthetic Data for Task-Oriented Semantic Parsing with Hierarchical Representations
Ke Tran | Ming Tan
Proceedings of the Fourth Workshop on Structured Prediction for NLP

Modern conversational AI systems support natural language understanding for a wide variety of capabilities. While a majority of these tasks can be accomplished using a simple and flat representation of intents and slots, more sophisticated capabilities require complex hierarchical representations supported by semantic parsing. State-of-the-art semantic parsers are trained using supervised learning with data labeled according to a hierarchical schema which might be costly to obtain or not readily available for a new domain. In this work, we explore the possibility of generating synthetic data for neural semantic parsing using a pretrained denoising sequence-to-sequence model (i.e., BART). Specifically, we first extract masked templates from the existing labeled utterances, and then fine-tune BART to generate synthetic utterances conditioning on the extracted templates. Finally, we use an auxiliary parser (AP) to filter the generated utterances. The AP guarantees the quality of the generated data. We show the potential of our approach when evaluating on the Facebook TOP dataset for navigation domain.

2019

pdf bib
Zero-shot Dependency Parsing with Pre-trained Multilingual Sentence Representations
Ke Tran | Arianna Bisazza
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

We investigate whether off-the-shelf deep bidirectional sentence representations (Devlin et al., 2019) trained on a massively multilingual corpus (multilingual BERT) enable the development of an unsupervised universal dependency parser. This approach only leverages a mix of monolingual corpora in many languages and does not require any translation data making it applicable to low-resource languages. In our experiments we outperform the best CoNLL 2018 language-specific systems in all of the shared task’s six truly low-resource languages while using a single system. However, we also find that (i) parsing accuracy still varies dramatically when changing the training languages and (ii) in some target languages zero-shot transfer fails under all tested conditions, raising concerns on the ‘universality’ of the whole approach.

2018

pdf bib
Inducing Grammars with and for Neural Machine Translation
Yonatan Bisk | Ke Tran
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

Machine translation systems require semantic knowledge and grammatical understanding. Neural machine translation (NMT) systems often assume this information is captured by an attention mechanism and a decoder that ensures fluency. Recent work has shown that incorporating explicit syntax alleviates the burden of modeling both types of knowledge. However, requiring parses is expensive and does not explore the question of what syntax a model needs during translation. To address both of these issues we introduce a model that simultaneously translates while inducing dependency trees. In this way, we leverage the benefits of structure while investigating what syntax NMT must induce to maximize performance. We show that our dependency trees are 1. language pair dependent and 2. improve translation quality.

pdf bib
The Importance of Being Recurrent for Modeling Hierarchical Structure
Ke Tran | Arianna Bisazza | Christof Monz
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Recent work has shown that recurrent neural networks (RNNs) can implicitly capture and exploit hierarchical information when trained to solve common natural language processing tasks (Blevins et al., 2018) such as language modeling (Linzen et al., 2016; Gulordava et al., 2018) and neural machine translation (Shi et al., 2016). In contrast, the ability to model structured data with non-recurrent neural networks has received little attention despite their success in many NLP tasks (Gehring et al., 2017; Vaswani et al., 2017). In this work, we compare the two architectures—recurrent versus non-recurrent—with respect to their ability to model hierarchical structure and find that recurrency is indeed important for this purpose. The code and data used in our experiments is available at https://github.com/ ketranm/fan_vs_rnn

2016

pdf bib
Recurrent Memory Networks for Language Modeling
Ke Tran | Arianna Bisazza | Christof Monz
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
A Dataset and Evaluation Metrics for Abstractive Compression of Sentences and Short Paragraphs
Kristina Toutanova | Chris Brockett | Ke M. Tran | Saleema Amershi
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Unsupervised Neural Hidden Markov Models
Ke M. Tran | Yonatan Bisk | Ashish Vaswani | Daniel Marcu | Kevin Knight
Proceedings of the Workshop on Structured Prediction for NLP

2015

pdf bib
A distributed inflection model for translating into morphologically rich languages
Ke Tran | Arianna Bisazza | Christof Monz
Proceedings of Machine Translation Summit XV: Papers

2014

pdf bib
Word Translation Prediction for Morphologically Rich Languages with Bilingual Neural Networks
Ke M. Tran | Arianna Bisazza | Christof Monz
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2012

pdf bib
Making Readability Indices Readable
Sara Tonelli | Ke Tran Manh | Emanuele Pianta
Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations