Large-Scale Differentially Private BERT

Rohan Anil; Badih Ghazi; Vineet Gupta; Ravi Kumar; Pasin Manurangsi

doi:10.18653/v1/2022.findings-emnlp.484

Large-Scale Differentially Private BERT

Rohan Anil, Badih Ghazi, Vineet Gupta, Ravi Kumar, Pasin Manurangsi

Abstract

In this work, we study the large-scale pretraining of BERT-Large (Devlin et al., 2019) with differentially private SGD (DP-SGD). We show that combined with a careful implementation, scaling up the batch size to millions (i.e., mega-batches) improves the utility of the DP-SGD step for BERT; we also enhance the training efficiency by using an increasing batch size schedule. Our implementation builds on the recent work of Subramani et al (2020), who demonstrated that the overhead of a DP-SGD step is minimized with effective use of JAX (Bradbury et al., 2018; Frostig et al., 2018) primitives in conjunction with the XLA compiler (XLA team and collaborators, 2017). Our implementation achieves a masked language model accuracy of 60.5% at a batch size of 2M, for epsilon=5, which is a reasonable privacy setting. To put this number in perspective, non-private BERT models achieve an accuracy of ∼70%.

Anthology ID:: 2022.findings-emnlp.484
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2022
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6481–6491
Language:
URL:: https://aclanthology.org/2022.findings-emnlp.484/
DOI:: 10.18653/v1/2022.findings-emnlp.484
Bibkey:
Cite (ACL):: Rohan Anil, Badih Ghazi, Vineet Gupta, Ravi Kumar, and Pasin Manurangsi. 2022. Large-Scale Differentially Private BERT. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 6481–6491, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Large-Scale Differentially Private BERT (Anil et al., Findings 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.findings-emnlp.484.pdf
Video:: https://aclanthology.org/2022.findings-emnlp.484.mp4

PDF Cite Search Video Fix data