Rationales for Sequential Predictions
Keyon Vafa | Yuntian Deng | David Blei | Alexander Rush
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Sequence models are a critical component of modern NLP systems, but their predictions are difficult to explain. We consider model explanations though rationales, subsets of context that can explain individual model predictions. We find sequential rationales by solving a combinatorial optimization: the best rationale is the smallest subset of input tokens that would predict the same output as the full sequence. Enumerating all subsets is intractable, so we propose an efficient greedy algorithm to approximate this objective. The algorithm, which is called greedy rationalization, applies to any model. For this approach to be effective, the model should form compatible conditional distributions when making predictions on incomplete subsets of the context. This condition can be enforced with a short fine-tuning step. We study greedy rationalization on language modeling and machine translation. Compared to existing baselines, greedy rationalization is best at optimizing the sequential objective and provides the most faithful rationales. On a new dataset of annotated sequential rationales, greedy rationales are most similar to human rationales.

Sequence-to-Lattice Models for Fast Translation
Yuntian Deng | Alexander Rush
Findings of the Association for Computational Linguistics: EMNLP 2021

Non-autoregressive machine translation (NAT) approaches enable fast generation by utilizing parallelizable generative processes. The remaining bottleneck in these models is their decoder layers; unfortunately unlike in autoregressive models (Kasai et al., 2020), removing decoder layers from NAT models significantly degrades accuracy. This work proposes a sequence-to-lattice model that replaces the decoder with a search lattice. Our approach first constructs a candidate lattice using efficient lookup operations, generates lattice scores from a deep encoder, and finally finds the best path using dynamic programming. Experiments on three machine translation datasets show that our method is faster than past non-autoregressive generation approaches, and more accurate than naively reducing the number of decoder layers.


Neural Linguistic Steganography
Zachary Ziegler | Yuntian Deng | Alexander Rush
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Whereas traditional cryptography encrypts a secret message into an unintelligible form, steganography conceals that communication is taking place by encoding a secret message into a cover signal. Language is a particularly pragmatic cover signal due to its benign occurrence and independence from any one medium. Traditionally, linguistic steganography systems encode secret messages in existing text via synonym substitution or word order rearrangements. Advances in neural language models enable previously impractical generation-based techniques. We propose a steganography technique based on arithmetic coding with large-scale neural language models. We find that our approach can generate realistic looking cover sentences as evaluated by humans, while at the same time preserving security by matching the cover message distribution with the language model distribution.


OpenNMT: Neural Machine Translation Toolkit
Guillaume Klein | Yoon Kim | Yuntian Deng | Vincent Nguyen | Jean Senellart | Alexander Rush
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

Bottom-Up Abstractive Summarization
Sebastian Gehrmann | Yuntian Deng | Alexander Rush
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Neural summarization produces outputs that are fluent and readable, but which can be poor at content selection, for instance often copying full sentences from the source document. This work explores the use of data-efficient content selectors to over-determine phrases in a source document that should be part of the summary. We use this selector as a bottom-up attention step to constrain the model to likely phrases. We show that this approach improves the ability to compress text, while still generating fluent summaries. This two-step process is both simpler and higher performing than other end-to-end content selection models, leading to significant improvements on ROUGE for both the CNN-DM and NYT corpus. Furthermore, the content selector can be trained with as little as 1,000 sentences making it easy to transfer a trained summarizer to a new domain.


Neural Machine Translation with Recurrent Attention Modeling
Zichao Yang | Zhiting Hu | Yuntian Deng | Chris Dyer | Alex Smola
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Knowing which words have been attended to in previous time steps while generating a translation is a rich source of information for predicting what words will be attended to in the future. We improve upon the attention model of Bahdanau et al. (2014) by explicitly modeling the relationship between previous and subsequent attention levels for each word using one recurrent network per input word. This architecture easily captures informative features, such as fertility and regularities in relative distortion. In experiments, we show our parameterization of attention improves translation quality.

OpenNMT: Open-Source Toolkit for Neural Machine Translation
Guillaume Klein | Yoon Kim | Yuntian Deng | Jean Senellart | Alexander Rush
Proceedings of ACL 2017, System Demonstrations


Learning Concept Taxonomies from Multi-modal Data
Hao Zhang | Zhiting Hu | Yuntian Deng | Mrinmaya Sachan | Zhicheng Yan | Eric Xing
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)


Entity Hierarchy Embedding
Zhiting Hu | Poyao Huang | Yuntian Deng | Yingkai Gao | Eric Xing
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)