Dmitry Vetrov
2019
Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks
Maxim Kodryan
|
Artem Grachev
|
Dmitry Ignatov
|
Dmitry Vetrov
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)
Reduction of the number of parameters is one of the most important goals in Deep Learning. In this article we propose an adaptation of Doubly Stochastic Variational Inference for Automatic Relevance Determination (DSVI-ARD) for neural networks compression. We find this method to be especially useful in language modeling tasks, where large number of parameters in the input and output layers is often excessive. We also show that DSVI-ARD can be applied together with encoder-decoder weight tying allowing to achieve even better sparsity and performance. Our experiments demonstrate that more than 90% of the weights in both encoder and decoder layers can be removed with a minimal quality loss.
2018
Conditional Generators of Words Definitions
Artyom Gadetsky
|
Ilya Yakubovskiy
|
Dmitry Vetrov
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
We explore recently introduced definition modeling technique that provided the tool for evaluation of different distributed vector representations of words through modeling dictionary definitions of words. In this work, we study the problem of word ambiguities in definition modeling and propose a possible solution by employing latent variable modeling and soft attention mechanisms. Our quantitative and qualitative evaluation and analysis of the model shows that taking into account words’ ambiguity and polysemy leads to performance improvement.
Bayesian Compression for Natural Language Processing
Nadezhda Chirkova
|
Ekaterina Lobacheva
|
Dmitry Vetrov
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
In natural language processing, a lot of the tasks are successfully solved with recurrent neural networks, but such models have a huge number of parameters. The majority of these parameters are often concentrated in the embedding layer, which size grows proportionally to the vocabulary length. We propose a Bayesian sparsification technique for RNNs which allows compressing the RNN dozens or hundreds of times without time-consuming hyperparameters tuning. We also generalize the model for vocabulary sparsification to filter out unnecessary words and compress the RNN even further. We show that the choice of the kept words is interpretable.
Search
Co-authors
- Artyom Gadetsky 1
- Ilya Yakubovskiy 1
- Maxim Kodryan 1
- Artem Grachev 1
- Dmitry Ignatov 1
- show all...