2022
pdf
bib
abs
Generate, Annotate, and Learn: NLP with Synthetic Text
Xuanli He
|
Islam Nassar
|
Jamie Kiros
|
Gholamreza Haffari
|
Mohammad Norouzi
Transactions of the Association for Computational Linguistics, Volume 10
This paper studies the use of language models as a source of synthetic unlabeled text for NLP. We formulate a general framework called “generate, annotate, and learn (GAL)” to take advantage of synthetic text within knowledge distillation, self-training, and few-shot learning applications. To generate high-quality task-specific text, we either fine-tune LMs on inputs from the task of interest, or prompt large LMs with few examples. We use the best available classifier to annotate synthetic text with soft pseudo labels for knowledge distillation and self-training, and use LMs to obtain hard labels for few-shot learning. We train new supervised models on the combination of labeled and pseudo-labeled data, which results in significant gains across several applications. We investigate key components of GAL and present theoretical and empirical arguments against the use of class-conditional LMs to generate synthetic labeled text instead of unlabeled text. GAL achieves new state-of-the-art knowledge distillation results for 6-layer transformers on the GLUE leaderboard.
2020
pdf
bib
abs
Multichannel Generative Language Model: Learning All Possible Factorizations Within and Across Channels
Harris Chan
|
Jamie Kiros
|
William Chan
Findings of the Association for Computational Linguistics: EMNLP 2020
A channel corresponds to a viewpoint or transformation of an underlying meaning. A pair of parallel sentences in English and French express the same underlying meaning, but through two separate channels corresponding to their languages. In this work, we present the Multichannel Generative Language Model (MGLM). MGLM is a generative joint distribution model over channels. MGLM marginalizes over all possible factorizations within and across all channels. MGLM endows flexible inference, including unconditional generation, conditional generation (where 1 channel is observed and other channels are generated), and partially observed generation (where incomplete observations are spread across all the channels). We experiment with the Multi30K dataset containing English, French, Czech, and German. We demonstrate experiments with unconditional, conditional, and partially conditional generation. We provide qualitative samples sampled unconditionally from the generative joint distribution. We also quantitatively analyze the quality-diversity trade-offs and find MGLM outperforms traditional bilingual discriminative models.
pdf
bib
abs
An Empirical Study of Generation Order for Machine Translation
William Chan
|
Mitchell Stern
|
Jamie Kiros
|
Jakob Uszkoreit
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
In this work, we present an empirical study of generation order for machine translation. Building on recent advances in insertion-based modeling, we first introduce a soft order-reward framework that enables us to train models to follow arbitrary oracle generation policies. We then make use of this framework to explore a large variety of generation orders, including uninformed orders, location-based orders, frequency-based orders, content-based orders, and model-based orders. Curiously, we find that for the WMT’14 English → German and WMT’18 English → Chinese translation tasks, order does not have a substantial impact on output quality. Moreover, for English → German, we even discover that unintuitive orderings such as alphabetical and shortest-first can match the performance of a standard Transformer, suggesting that traditional left-to-right generation may not be necessary to achieve high performance.
2018
pdf
bib
abs
Illustrative Language Understanding: Large-Scale Visual Grounding with Image Search
Jamie Kiros
|
William Chan
|
Geoffrey Hinton
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We introduce Picturebook, a large-scale lookup operation to ground language via ‘snapshots’ of our physical world accessed through image search. For each word in a vocabulary, we extract the top-k images from Google image search and feed the images through a convolutional network to extract a word embedding. We introduce a multimodal gating function to fuse our Picturebook embeddings with other word representations. We also introduce Inverse Picturebook, a mechanism to map a Picturebook embedding back into words. We experiment and report results across a wide range of tasks: word similarity, natural language inference, semantic relatedness, sentiment/topic classification, image-sentence ranking and machine translation. We also show that gate activations corresponding to Picturebook embeddings are highly correlated to human judgments of concreteness ratings.
pdf
bib
Proceedings of the Third Workshop on Representation Learning for NLP
Isabelle Augenstein
|
Kris Cao
|
He He
|
Felix Hill
|
Spandana Gella
|
Jamie Kiros
|
Hongyuan Mei
|
Dipendra Misra
Proceedings of the Third Workshop on Representation Learning for NLP
pdf
bib
abs
InferLite: Simple Universal Sentence Representations from Natural Language Inference Data
Jamie Kiros
|
William Chan
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Natural language inference has been shown to be an effective supervised task for learning generic sentence embeddings. In order to better understand the components that lead to effective representations, we propose a lightweight version of InferSent, called InferLite, that does not use any recurrent layers and operates on a collection of pre-trained word embeddings. We show that a simple instance of our model that makes no use of context, word ordering or position can still obtain competitive performance on the majority of downstream prediction tasks, with most performance gaps being filled by adding local contextual information through temporal convolutions. Our models can be trained in under 1 hour on a single GPU and allows for fast inference of new representations. Finally we describe a semantic hashing layer that allows our model to learn generic binary codes for sentences.
2016
pdf
bib
Towards Generalizable Sentence Embeddings
Eleni Triantafillou
|
Jamie Ryan Kiros
|
Raquel Urtasun
|
Richard Zemel
Proceedings of the 1st Workshop on Representation Learning for NLP