Doc2hash: Learning Discrete Latent variables for Documents Retrieval

Yifei Zhang, Hao Zhu


Abstract
Learning to hash via generative model has become a powerful paradigm for fast similarity search in documents retrieval. To get binary representation (i.e., hash codes), the discrete distribution prior (i.e., Bernoulli Distribution) is applied to train the variational autoencoder (VAE). However, the discrete stochastic layer is usually incompatible with the backpropagation in the training stage, and thus causes a gradient flow problem because of non-differentiable operators. The reparameterization trick of sampling from a discrete distribution usually inc non-differentiable operators. In this paper, we propose a method, Doc2hash, that solves the gradient flow problem of the discrete stochastic layer by using continuous relaxation on priors, and trains the generative model in an end-to-end manner to generate hash codes. In qualitative and quantitative experiments, we show the proposed model outperforms other state-of-art methods.
Anthology ID:
N19-1232
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2235–2240
Language:
URL:
https://aclanthology.org/N19-1232
DOI:
10.18653/v1/N19-1232
Bibkey:
Cite (ACL):
Yifei Zhang and Hao Zhu. 2019. Doc2hash: Learning Discrete Latent variables for Documents Retrieval. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2235–2240, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
Doc2hash: Learning Discrete Latent variables for Documents Retrieval (Zhang & Zhu, NAACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/N19-1232.pdf
Code
 yifeiacc/doc2hash
Data
RCV1