Pyramidal Recurrent Unit for Language Modeling

Sachin Mehta, Rik Koncel-Kedziorski, Mohammad Rastegari, Hannaneh Hajishirzi


Abstract
LSTMs are powerful tools for modeling contextual information, as evidenced by their success at the task of language modeling. However, modeling contexts in very high dimensional space can lead to poor generalizability. We introduce the Pyramidal Recurrent Unit (PRU), which enables learning representations in high dimensional space with more generalization power and fewer parameters. PRUs replace the linear transformation in LSTMs with more sophisticated interactions such as pyramidal or grouped linear transformations. This architecture gives strong results on word-level language modeling while reducing parameters significantly. In particular, PRU improves the perplexity of a recent state-of-the-art language model by up to 1.3 points while learning 15-20% fewer parameters. For similar number of model parameters, PRU outperforms all previous RNN models that exploit different gating mechanisms and transformations. We provide a detailed examination of the PRU and its behavior on the language modeling tasks. Our code is open-source and available at https://sacmehta.github.io/PRU/.
Anthology ID:
D18-1491
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
4620–4630
Language:
URL:
https://aclanthology.org/D18-1491
DOI:
10.18653/v1/D18-1491
Bibkey:
Cite (ACL):
Sachin Mehta, Rik Koncel-Kedziorski, Mohammad Rastegari, and Hannaneh Hajishirzi. 2018. Pyramidal Recurrent Unit for Language Modeling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4620–4630, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Pyramidal Recurrent Unit for Language Modeling (Mehta et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1491.pdf
Code
 sacmehta/PRU +  additional community code
Data
Penn TreebankWikiText-2