Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency

Zhuang Ma; Michael Collins

doi:10.18653/v1/D18-1405

Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency

Abstract

Noise Contrastive Estimation (NCE) is a powerful parameter estimation method for log-linear models, which avoids calculation of the partition function or its derivatives at each training step, a computationally demanding step in many cases. It is closely related to negative sampling methods, now widely used in NLP. This paper considers NCE-based estimation of conditional models. Conditional models are frequently encountered in practice; however there has not been a rigorous theoretical analysis of NCE in this setting, and we will argue there are subtle but important questions when generalizing NCE to the conditional case. In particular, we analyze two variants of NCE for conditional models: one based on a classification objective, the other based on a ranking objective. We show that the ranking-based variant of NCE gives consistent parameter estimates under weaker assumptions than the classification-based method; we analyze the statistical efficiency of the ranking-based and classification-based variants of NCE; finally we describe experiments on synthetic data and language modeling showing the effectiveness and tradeoffs of both methods.

Anthology ID:: D18-1405
Volume:: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:: October-November
Year:: 2018
Address:: Brussels, Belgium
Editors:: Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:: EMNLP
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3698–3707
Language:
URL:: https://aclanthology.org/D18-1405/
DOI:: 10.18653/v1/D18-1405
Bibkey:
Cite (ACL):: Zhuang Ma and Michael Collins. 2018. Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3698–3707, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):: Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency (Ma & Collins, EMNLP 2018)
Copy Citation:
PDF:: https://aclanthology.org/D18-1405.pdf
Attachment:: D18-1405.Attachment.pdf
Video:: https://aclanthology.org/D18-1405.mp4
Data: Penn Treebank, WikiQA

PDF Cite Search Attachment Video Fix data