Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

Mitchell Gordon, Kevin Duh, Nicholas Andrews


Abstract
Pre-trained universal feature extractors, such as BERT for natural language processing and VGG for computer vision, have become effective methods for improving deep learning models without requiring more labeled data. While effective, feature extractors like BERT may be prohibitively large for some deployment scenarios. We explore weight pruning for BERT and ask: how does compression during pre-training affect transfer learning? We find that pruning affects transfer learning in three broad regimes. Low levels of pruning (30-40%) do not affect pre-training loss or transfer to downstream tasks at all. Medium levels of pruning increase the pre-training loss and prevent useful pre-training information from being transferred to downstream tasks. High levels of pruning additionally prevent models from fitting downstream datasets, leading to further degradation. Finally, we observe that fine-tuning BERT on a specific task does not improve its prunability. We conclude that BERT can be pruned once during pre-training rather than separately for each task without affecting performance.
Anthology ID:
2020.repl4nlp-1.18
Volume:
Proceedings of the 5th Workshop on Representation Learning for NLP
Month:
July
Year:
2020
Address:
Online
Editors:
Spandana Gella, Johannes Welbl, Marek Rei, Fabio Petroni, Patrick Lewis, Emma Strubell, Minjoon Seo, Hannaneh Hajishirzi
Venue:
RepL4NLP
SIG:
SIGREP
Publisher:
Association for Computational Linguistics
Note:
Pages:
143–155
Language:
URL:
https://aclanthology.org/2020.repl4nlp-1.18
DOI:
10.18653/v1/2020.repl4nlp-1.18
Bibkey:
Cite (ACL):
Mitchell Gordon, Kevin Duh, and Nicholas Andrews. 2020. Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning. In Proceedings of the 5th Workshop on Representation Learning for NLP, pages 143–155, Online. Association for Computational Linguistics.
Cite (Informal):
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning (Gordon et al., RepL4NLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.repl4nlp-1.18.pdf
Video:
 http://slideslive.com/38929784
Code
 mitchellgordon95/bert-prune
Data
GLUE