Why ADAGRAD Fails for Online Topic Modeling

You Lu, Jeffrey Lund, Jordan Boyd-Graber


Abstract
Online topic modeling, i.e., topic modeling with stochastic variational inference, is a powerful and efficient technique for analyzing large datasets, and ADAGRAD is a widely-used technique for tuning learning rates during online gradient optimization. However, these two techniques do not work well together. We show that this is because ADAGRAD uses accumulation of previous gradients as the learning rates’ denominators. For online topic modeling, the magnitude of gradients is very large. It causes learning rates to shrink very quickly, so the parameters cannot fully converge until the training ends
Anthology ID:
D17-1046
Volume:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Martha Palmer, Rebecca Hwa, Sebastian Riedel
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
446–451
Language:
URL:
https://aclanthology.org/D17-1046
DOI:
10.18653/v1/D17-1046
Bibkey:
Cite (ACL):
You Lu, Jeffrey Lund, and Jordan Boyd-Graber. 2017. Why ADAGRAD Fails for Online Topic Modeling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 446–451, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Why ADAGRAD Fails for Online Topic Modeling (Lu et al., EMNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/D17-1046.pdf
Attachment:
 D17-1046.Attachment.zip