Learning with Noise-Contrastive Estimation: Easing training by learning to scale

Matthieu Labeau, Alexandre Allauzen


Abstract
Noise-Contrastive Estimation (NCE) is a learning criterion that is regularly used to train neural language models in place of Maximum Likelihood Estimation, since it avoids the computational bottleneck caused by the output softmax. In this paper, we analyse and explain some of the weaknesses of this objective function, linked to the mechanism of self-normalization, by closely monitoring comparative experiments. We then explore several remedies and modifications to propose tractable and efficient NCE training strategies. In particular, we propose to make the scaling factor a trainable parameter of the model, and to use the noise distribution to initialize the output bias. These solutions, yet simple, yield stable and competitive performances in either small and large scale language modelling tasks.
Anthology ID:
C18-1261
Volume:
Proceedings of the 27th International Conference on Computational Linguistics
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Emily M. Bender, Leon Derczynski, Pierre Isabelle
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3090–3101
Language:
URL:
https://aclanthology.org/C18-1261/
DOI:
Bibkey:
Cite (ACL):
Matthieu Labeau and Alexandre Allauzen. 2018. Learning with Noise-Contrastive Estimation: Easing training by learning to scale. In Proceedings of the 27th International Conference on Computational Linguistics, pages 3090–3101, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Learning with Noise-Contrastive Estimation: Easing training by learning to scale (Labeau & Allauzen, COLING 2018)
Copy Citation:
PDF:
https://aclanthology.org/C18-1261.pdf
Data
Billion Word Benchmark