Ashish Vaswani


2021

pdf bib
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy | Mohammad Saffar | Ashish Vaswani | David Grangier
Transactions of the Association for Computational Linguistics, Volume 9

Self-attention has recently been adopted for a wide range of sequence modeling problems. Despite its effectiveness, self-attention suffers from quadratic computation and memory requirements with respect to sequence length. Successful approaches to reduce this complexity focused on attending to local sliding windows or a small set of locations independent of content. Our work proposes to learn dynamic sparse attention patterns that avoid allocating computation and memory to attend to content unrelated to the query of interest. This work builds upon two lines of research: It combines the modeling flexibility of prior work on content-based sparse attention with the efficiency gains from approaches based on local, temporal sparse attention. Our model, the Routing Transformer, endows self-attention with a sparse routing module based on online k-means while reducing the overall complexity of attention to O(n1.5d) from O(n2d) for sequence length n and hidden dimension d. We show that our model outperforms comparable sparse attention models on language modeling on Wikitext-103 (15.8 vs 18.3 perplexity), as well as on image generation on ImageNet-64 (3.43 vs 3.44 bits/dim) while using fewer self-attention layers. Additionally, we set a new state-of-the-art on the newly released PG-19 data-set, obtaining a test perplexity of 33.2 with a 22 layer Routing Transformer model trained on sequences of length 8192. We open-source the code for Routing Transformer in Tensorflow.1

pdf bib
Simple and Efficient ways to Improve REALM
Vidhisha Balachandran | Ashish Vaswani | Yulia Tsvetkov | Niki Parmar
Proceedings of the 3rd Workshop on Machine Reading for Question Answering

Dense retrieval has been shown to be effective for Open Domain Question Answering, surpassing sparse retrieval methods like BM25. One such model, REALM, (Guu et al., 2020) is an end-to-end dense retrieval system that uses MLM based pretraining for improved downstream QA performance. However, the current REALM setup uses limited resources and is not comparable in scale to more recent systems, contributing to its lower performance. Additionally, it relies on noisy supervision for retrieval during fine-tuning. We propose REALM++, where we improve upon the training and inference setups and introduce better supervision signal for improving performance, without any architectural changes. REALM++ achieves ~5.5% absolute accuracy gains over the baseline while being faster to train. It also matches the performance of large models which have 3x more parameters demonstrating the efficiency of our setup.

2019

pdf bib
Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation
Vihan Jain | Gabriel Magalhaes | Alexander Ku | Ashish Vaswani | Eugene Ie | Jason Baldridge
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Advances in learning and representations have reinvigorated work that connects language to other modalities. A particularly exciting direction is Vision-and-Language Navigation(VLN), in which agents interpret natural language instructions and visual scenes to move through environments and reach goals. Despite recent progress, current research leaves unclear how much of a role language under-standing plays in this task, especially because dominant evaluation metrics have focused on goal completion rather than the sequence of actions corresponding to the instructions. Here, we highlight shortcomings of current metrics for the Room-to-Room dataset (Anderson et al.,2018b) and propose a new metric, Coverage weighted by Length Score (CLS). We also show that the existing paths in the dataset are not ideal for evaluating instruction following because they are direct-to-goal shortest paths. We join existing short paths to form more challenging extended paths to create a new data set, Room-for-Room (R4R). Using R4R and CLS, we show that agents that receive rewards for instruction fidelity outperform agents that focus on goal completion.

2018

pdf bib
Self-Attention with Relative Position Representations
Peter Shaw | Jakob Uszkoreit | Ashish Vaswani
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Relying entirely on an attention mechanism, the Transformer introduced by Vaswani et al. (2017) achieves state-of-the-art results for machine translation. In contrast to recurrent and convolutional neural networks, it does not explicitly model relative or absolute position information in its structure. Instead, it requires adding representations of absolute positions to its inputs. In this work we present an alternative approach, extending the self-attention mechanism to efficiently consider representations of the relative positions, or distances between sequence elements. On the WMT 2014 English-to-German and English-to-French translation tasks, this approach yields improvements of 1.3 BLEU and 0.3 BLEU over absolute position representations, respectively. Notably, we observe that combining relative and absolute position representations yields no further improvement in translation quality. We describe an efficient implementation of our method and cast it as an instance of relation-aware self-attention mechanisms that can generalize to arbitrary graph-labeled inputs.

pdf bib
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
Mia Xu Chen | Orhan Firat | Ankur Bapna | Melvin Johnson | Wolfgang Macherey | George Foster | Llion Jones | Mike Schuster | Noam Shazeer | Niki Parmar | Ashish Vaswani | Jakob Uszkoreit | Lukasz Kaiser | Zhifeng Chen | Yonghui Wu | Macduff Hughes
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The past year has witnessed rapid advances in sequence-to-sequence (seq2seq) modeling for Machine Translation (MT). The classic RNN-based approaches to MT were first out-performed by the convolutional seq2seq model, which was then out-performed by the more recent Transformer model. Each of these new approaches consists of a fundamental architecture accompanied by a set of modeling and training techniques that are in principle applicable to other seq2seq architectures. In this paper, we tease apart the new architectures and their accompanying techniques in two ways. First, we identify several key modeling and training techniques, and apply them to the RNN architecture, yielding a new RNMT+ model that outperforms all of the three fundamental architectures on the benchmark WMT’14 English to French and English to German tasks. Second, we analyze the properties of each fundamental seq2seq architecture and devise new hybrid architectures intended to combine their strengths. Our hybrid models obtain further improvements, outperforming the RNMT+ model on both benchmark datasets.

pdf bib
Tensor2Tensor for Neural Machine Translation
Ashish Vaswani | Samy Bengio | Eugene Brevdo | Francois Chollet | Aidan Gomez | Stephan Gouws | Llion Jones | Łukasz Kaiser | Nal Kalchbrenner | Niki Parmar | Ryan Sepassi | Noam Shazeer | Jakob Uszkoreit
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

2016

pdf bib
Efficient Structured Inference for Transition-Based Parsing with Neural Networks and Error States
Ashish Vaswani | Kenji Sagae
Transactions of the Association for Computational Linguistics, Volume 4

Transition-based approaches based on local classification are attractive for dependency parsing due to their simplicity and speed, despite producing results slightly below the state-of-the-art. In this paper, we propose a new approach for approximate structured inference for transition-based parsing that produces scores suitable for global scoring using local models. This is accomplished with the introduction of error states in local training, which add information about incorrect derivation paths typically left out completely in locally-trained models. Using neural networks for our local classifiers, our approach achieves 93.61% accuracy for transition-based dependency parsing in English.

pdf bib
Supertagging With LSTMs
Ashish Vaswani | Yonatan Bisk | Kenji Sagae | Ryan Musa
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Name Tagging for Low-resource Incident Languages based on Expectation-driven Learning
Boliang Zhang | Xiaoman Pan | Tianlu Wang | Ashish Vaswani | Heng Ji | Kevin Knight | Daniel Marcu
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Simple, Fast Noise-Contrastive Estimation for Large RNN Vocabularies
Barret Zoph | Ashish Vaswani | Jonathan May | Kevin Knight
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Unsupervised Neural Hidden Markov Models
Ke M. Tran | Yonatan Bisk | Ashish Vaswani | Daniel Marcu | Kevin Knight
Proceedings of the Workshop on Structured Prediction for NLP

2015

pdf bib
Model Invertibility Regularization: Sequence Alignment With or Without Parallel Data
Tomer Levinboim | Ashish Vaswani | David Chiang
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Unifying Bayesian Inference and Vector Space Models for Improved Decipherment
Qing Dou | Ashish Vaswani | Kevin Knight | Chris Dyer
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Aligning context-based statistical models of language with brain activity during reading
Leila Wehbe | Ashish Vaswani | Kevin Knight | Tom Mitchell
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Beyond Parallel Data: Joint Word Alignment and Decipherment Improves Machine Translation
Qing Dou | Ashish Vaswani | Kevin Knight
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Decoding with Large-Scale Neural Language Models Improves Translation
Ashish Vaswani | Yinggong Zhao | Victoria Fossum | David Chiang
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Learning Whom to Trust with MACE
Dirk Hovy | Taylor Berg-Kirkpatrick | Ashish Vaswani | Eduard Hovy
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

pdf bib
Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the l0-norm
Ashish Vaswani | Liang Huang | David Chiang
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

pdf bib
Rule Markov Models for Fast Tree-to-String Translation
Ashish Vaswani | Haitao Mi | Liang Huang | David Chiang
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Models and Training for Unsupervised Preposition Sense Disambiguation
Dirk Hovy | Ashish Vaswani | Stephen Tratz | David Chiang | Eduard Hovy
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Efficient Optimization of an MDL-Inspired Objective Function for Unsupervised Part-Of-Speech Tagging
Ashish Vaswani | Adam Pauls | David Chiang
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Fast, Greedy Model Minimization for Unsupervised Tagging
Sujith Ravi | Ashish Vaswani | Kevin Knight | David Chiang
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2007

pdf bib
Hassan: A Virtual Human for Tactical Questioning
David Traum | Antonio Roque | Anton Leuski | Panayiotis Georgiou | Jillian Gerten | Bilyana Martinovski | Shrikanth Narayanan | Susan Robinson | Ashish Vaswani
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue