A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

Mikhail Khodak; Nikunj Saunshi; Yingyu Liang; Tengyu Ma; Brandon M. Stewart; Sanjeev Arora

doi:10.18653/v1/P18-1002

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

Mikhail Khodak, Nikunj Saunshi, Yingyu Liang, Tengyu Ma, Brandon Stewart, Sanjeev Arora

Abstract

Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features. This paper introduces a la carte embedding, a simple and general alternative to the usual word2vec-based approaches for building such representations that is based upon recent theoretical results for GloVe-like embeddings. Our method relies mainly on a linear transformation that is efficiently learnable using pretrained word vectors and linear regression. This transform is applicable on the fly in the future when a new text feature or rare word is encountered, even if only a single usage example is available. We introduce a new dataset showing how the a la carte method requires fewer examples of words in context to learn high-quality embeddings and we obtain state-of-the-art results on a nonce task and some unsupervised document classification tasks.

Anthology ID:: P18-1002
Volume:: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2018
Address:: Melbourne, Australia
Editors:: Iryna Gurevych, Yusuke Miyao
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12–22
Language:
URL:: https://aclanthology.org/P18-1002/
DOI:: 10.18653/v1/P18-1002
Bibkey:
Cite (ACL):: Mikhail Khodak, Nikunj Saunshi, Yingyu Liang, Tengyu Ma, Brandon Stewart, and Sanjeev Arora. 2018. A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12–22, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):: A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors (Khodak et al., ACL 2018)
Copy Citation:
PDF:: https://aclanthology.org/P18-1002.pdf
Presentation:: P18-1002.Presentation.pdf
Video:: https://aclanthology.org/P18-1002.mp4
Code: NLPrinceton/ALaCarte
Data: IMDb Movie Reviews, MPQA Opinion Corpus, MR, SST, SST-2, SST-5, SUBJ

PDF Cite Search Code Presentation Video Fix data