Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling

Haw-Shiuan Chang; Ruei-Yao Sun; Kathryn Ricci; Andrew Mccallum

doi:10.18653/v1/2023.acl-long.48

Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling

Haw-Shiuan Chang, Ruei-Yao Sun, Kathryn Ricci, Andrew McCallum

Abstract

Ensembling BERT models often significantly improves accuracy, but at the cost of significantly more computation and memory footprint. In this work, we propose Multi-CLS BERT, a novel ensembling method for CLS-based prediction tasks that is almost as efficient as a single BERT model. Multi-CLS BERT uses multiple CLS tokens with a parameterization and objective that encourages their diversity. Thus instead of fine-tuning each BERT model in an ensemble (and running them all at test time), we need only fine-tune our single Multi-CLS BERT model (and run the one model at test time, ensembling just the multiple final CLS embeddings). To test its effectiveness, we build Multi-CLS BERT on top of a state-of-the-art pretraining method for BERT (Aroca-Ouellette and Rudzicz, 2020). In experiments on GLUE and SuperGLUE we show that our Multi-CLS BERT reliably improves both overall accuracy and confidence estimation. When only 100 training samples are available in GLUE, the Multi-CLS BERT_Base model can even outperform the corresponding BERT_Large model. We analyze the behavior of our Multi-CLS BERT, showing that it has many of the same characteristics and behavior as a typical BERT 5-way ensemble, but with nearly 4-times less computation and memory.

Anthology ID:: 2023.acl-long.48
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 821–854
Language:
URL:: https://aclanthology.org/2023.acl-long.48
DOI:: 10.18653/v1/2023.acl-long.48
Bibkey:
Cite (ACL):: Haw-Shiuan Chang, Ruei-Yao Sun, Kathryn Ricci, and Andrew McCallum. 2023. Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 821–854, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling (Chang et al., ACL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.acl-long.48.pdf
Video:: https://aclanthology.org/2023.acl-long.48.mp4

PDF Cite Search Video