Generate-and-Retrieve: Use Your Predictions to Improve Retrieval for Semantic Parsing
Yury Zemlyanskiy | Michiel de Jong | Joshua Ainslie | Panupong Pasupat | Peter Shaw | Linlu Qiu | Sumit Sanghai | Fei Sha
Proceedings of the 29th International Conference on Computational Linguistics

A common recent approach to semantic parsing augments sequence-to-sequence models by retrieving and appending a set of training samples, called exemplars. The effectiveness of this recipe is limited by the ability to retrieve informative exemplars that help produce the correct parse, which is especially challenging in low-resource settings. Existing retrieval is commonly based on similarity of query and exemplar inputs. We propose GandR, a retrieval procedure that retrieves exemplars for which outputs are also similar. GandR first generates a preliminary prediction with input-based retrieval. Then, it retrieves exemplars with outputs similar to the preliminary prediction which are used to generate a final prediction. GandR sets the state of the art on multiple low-resource semantic parsing tasks.


ETC: Encoding Long and Structured Inputs in Transformers
Joshua Ainslie | Santiago Ontanon | Chris Alberti | Vaclav Cvicek | Zachary Fisher | Philip Pham | Anirudh Ravula | Sumit Sanghai | Qifan Wang | Li Yang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Transformer models have advanced the state of the art in many Natural Language Processing (NLP) tasks. In this paper, we present a new Transformer architecture, “Extended Transformer Construction” (ETC), that addresses two key challenges of standard Transformer architectures, namely scaling input length and encoding structured inputs. To scale attention to longer inputs, we introduce a novel global-local attention mechanism between global tokens and regular input tokens. We also show that combining global-local attention with relative position encodings and a “Contrastive Predictive Coding” (CPC) pre-training objective allows ETC to encode structured inputs. We achieve state-of-the-art results on four natural language datasets requiring long and/or structured inputs.