PCEE-BERT: Accelerating BERT Inference via Patient and Confident Early Exiting

Zhen Zhang, Wei Zhu, Jinfan Zhang, Peng Wang, Rize Jin, Tae-Sun Chung


Abstract
BERT and other pretrained language models (PLMs) are ubiquitous in modern NLP. Even though PLMs are the state-of-the-art (SOTA) models for almost every NLP task (CITATION), the significant latency during inference prohibits wider industrial usage. In this work, we propose Patient and Confident Early Exiting BERT (PCEE-BERT), an off-the-shelf sample-dependent early exiting method that can work with different PLMs and can also work along with popular model compression methods. With a multi-exit BERT as the backbone model, PCEE-BERT will make the early exiting decision if enough numbers (patience parameter) of consecutive intermediate layers are confident about their predictions. The entropy value measures the confidence level of an intermediate layer’s prediction. Experiments on the GLUE benchmark demonstrate that our method outperforms previous SOTA early exiting methods. Ablation studies show that: (a) our method performs consistently well on other PLMs, such as ALBERT and TinyBERT; (b) PCEE-BERT can achieve different speed-up ratios by adjusting the patience parameter and the confidence threshold. The code for PCEE-BERT can be found at https://github.com/michael-wzhu/PCEE-BERT.
Anthology ID:
2022.findings-naacl.25
Volume:
Findings of the Association for Computational Linguistics: NAACL 2022
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
327–338
Language:
URL:
https://aclanthology.org/2022.findings-naacl.25
DOI:
10.18653/v1/2022.findings-naacl.25
Bibkey:
Cite (ACL):
Zhen Zhang, Wei Zhu, Jinfan Zhang, Peng Wang, Rize Jin, and Tae-Sun Chung. 2022. PCEE-BERT: Accelerating BERT Inference via Patient and Confident Early Exiting. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 327–338, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
PCEE-BERT: Accelerating BERT Inference via Patient and Confident Early Exiting (Zhang et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-naacl.25.pdf
Code
 michael-wzhu/pcee-bert
Data
CIFAR-10CIFAR-100GLUEQNLI