PCEE-BERT: Accelerating BERT Inference via Patient and Confident Early Exiting
Zhen Zhang | Wei Zhu | Jinfan Zhang | Peng Wang | Rize Jin | Tae-Sun Chung
Findings of the Association for Computational Linguistics: NAACL 2022
BERT and other pretrained language models (PLMs) are ubiquitous in modern NLP. Even though PLMs are the state-of-the-art (SOTA) models for almost every NLP task (CITATION), the significant latency during inference prohibits wider industrial usage. In this work, we propose Patient and Confident Early Exiting BERT (PCEE-BERT), an off-the-shelf sample-dependent early exiting method that can work with different PLMs and can also work along with popular model compression methods. With a multi-exit BERT as the backbone model, PCEE-BERT will make the early exiting decision if enough numbers (patience parameter) of consecutive intermediate layers are confident about their predictions. The entropy value measures the confidence level of an intermediate layer’s prediction. Experiments on the GLUE benchmark demonstrate that our method outperforms previous SOTA early exiting methods. Ablation studies show that: (a) our method performs consistently well on other PLMs, such as ALBERT and TinyBERT; (b) PCEE-BERT can achieve different speed-up ratios by adjusting the patience parameter and the confidence threshold. The code for PCEE-BERT can be found at https://github.com/michael-wzhu/PCEE-BERT.
In this paper we describe our submission to the shared tasks of the 9th Workshop on Asian Translation (WAT 2022) on NICT–SAP under the team name ”HwTscSU”. The tasks involve translation from 5 languages into English and vice-versa in two domains: IT domain and Wikinews domain. The purpose is to determine the feasibility of multilingualism, domain adaptation or document-level knowledge given very little to none clean parallel corpora for training. Our approach for all translation tasks mainly focused on pre-training NMT models on general datasets and fine-tuning them on domain-specific datasets. Due to the small amount of parallel corpora, we collected and cleaned the OPUS dataset including three IT domain corpora, i.e., GNOME, KDE4, and Ubuntu. We then trained Transformer models on the collected dataset and fine-tuned on corresponding dev set. The BLEU scores greatly improved in comparison with other systems. Our submission ranked 1st in all IT-domain tasks and in one out of eight ALT domain tasks.