%0 Conference Proceedings %T Layer-wise Model Pruning based on Mutual Information %A Fan, Chun %A Li, Jiwei %A Zhang, Tianwei %A Ao, Xiang %A Wu, Fei %A Meng, Yuxian %A Sun, Xiaofei %Y Moens, Marie-Francine %Y Huang, Xuanjing %Y Specia, Lucia %Y Yih, Scott Wen-tau %S Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing %D 2021 %8 November %I Association for Computational Linguistics %C Online and Punta Cana, Dominican Republic %F fan-etal-2021-layer %X Inspired by mutual information (MI) based feature selection in SVMs and logistic regression, in this paper, we propose MI-based layer-wise pruning: for each layer of a multi-layer neural network, neurons with higher values of MI with respect to preserved neurons in the upper layer are preserved. Starting from the top softmax layer, layer-wise pruning proceeds in a top-down fashion until reaching the bottom word embedding layer. The proposed pruning strategy offers merits over weight-based pruning techniques: (1) it avoids irregular memory access since representations and matrices can be squeezed into their smaller but dense counterparts, leading to greater speedup; (2) in a manner of top-down pruning, the proposed method operates from a more global perspective based on training signals in the top layer, and prunes each layer by propagating the effect of global signals through layers, leading to better performances at the same sparsity level. Extensive experiments show that at the same sparsity level, the proposed strategy offers both greater speedup and higher performances than weight-based pruning methods (e.g., magnitude pruning, movement pruning). %R 10.18653/v1/2021.emnlp-main.246 %U https://aclanthology.org/2021.emnlp-main.246 %U https://doi.org/10.18653/v1/2021.emnlp-main.246 %P 3079-3090