BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression

Ji Xin, Raphael Tang, Yaoliang Yu, Jimmy Lin


Abstract
The slow speed of BERT has motivated much research on accelerating its inference, and the early exiting idea has been proposed to make trade-offs between model quality and efficiency. This paper aims to address two weaknesses of previous work: (1) existing fine-tuning strategies for early exiting models fail to take full advantage of BERT; (2) methods to make exiting decisions are limited to classification tasks. We propose a more advanced fine-tuning strategy and a learning-to-exit module that extends early exiting to tasks other than classification. Experiments demonstrate improved early exiting for BERT, with better trade-offs obtained by the proposed fine-tuning strategy, successful application to regression tasks, and the possibility to combine it with other acceleration methods. Source code can be found at https://github.com/castorini/berxit.
Anthology ID:
2021.eacl-main.8
Volume:
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Month:
April
Year:
2021
Address:
Online
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
91–104
Language:
URL:
https://aclanthology.org/2021.eacl-main.8
DOI:
10.18653/v1/2021.eacl-main.8
Bibkey:
Cite (ACL):
Ji Xin, Raphael Tang, Yaoliang Yu, and Jimmy Lin. 2021. BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 91–104, Online. Association for Computational Linguistics.
Cite (Informal):
BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression (Xin et al., EACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.eacl-main.8.pdf
Code
 castorini/berxit
Data
GLUEQNLISICK