Low-rank passthrough neural networks

Antonio Valerio Miceli Barone


Abstract
Various common deep learning architectures, such as LSTMs, GRUs, Resnets and Highway Networks, employ state passthrough connections that support training with high feed-forward depth or recurrence over many time steps. These “Passthrough Networks” architectures also enable the decoupling of the network state size from the number of parameters of the network, a possibility has been studied by Sak et al. (2014) with their low-rank parametrization of the LSTM. In this work we extend this line of research, proposing effective, low-rank and low-rank plus diagonal matrix parametrizations for Passthrough Networks which exploit this decoupling property, reducing the data complexity and memory requirements of the network while preserving its memory capacity. This is particularly beneficial in low-resource settings as it supports expressive models with a compact parametrization less susceptible to overfitting. We present competitive experimental results on several tasks, including language modeling and a near state of the art result on sequential randomly-permuted MNIST classification, a hard task on natural data.
Anthology ID:
W18-3410
Volume:
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP
Month:
July
Year:
2018
Address:
Melbourne
Editors:
Reza Haffari, Colin Cherry, George Foster, Shahram Khadivi, Bahar Salehi
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
77–86
Language:
URL:
https://aclanthology.org/W18-3410
DOI:
10.18653/v1/W18-3410
Bibkey:
Cite (ACL):
Antonio Valerio Miceli Barone. 2018. Low-rank passthrough neural networks. In Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP, pages 77–86, Melbourne. Association for Computational Linguistics.
Cite (Informal):
Low-rank passthrough neural networks (Miceli Barone, ACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-3410.pdf
Code
 Avmb/lowrank-gru +  additional community code