Training Text-to-Text Transformers with Privacy Guarantees

Natalia Ponomareva; Jasmijn Bastings; Sergei Vassilvitskii

doi:10.18653/v1/2022.privatenlp-1.4

Training Text-to-Text Transformers with Privacy Guarantees

Natalia Ponomareva, Jasmijn Bastings, Sergei Vassilvitskii

Abstract

Recent advances in NLP often stem from large transformer-based pre-trained models, which rapidly grow in size and use more and more training data. Such models are often released to the public so that end users can fine-tune them on a task dataset. While it is common to treat pre-training data as public, it may still contain personally identifiable information (PII), such as names, phone numbers, and copyrighted material. Recent findings show that the capacity of these models allows them to memorize parts of the training data, and suggest differentially private (DP) training as a potential mitigation. While there is recent work on DP fine-tuning of NLP models, the effects of DP pre-training are less well understood it is not clear how downstream performance is affected by DP pre-training, and whether DP pre-training mitigates some of the memorization concerns. We focus on T5 and show that by using recent advances in JAX and XLA we can train models with DP that do not suffer a large drop in pre-training utility, nor in training speed, and can still be fine-tuned to high accuracy on downstream tasks (e.g. GLUE). Moreover, we show that T5s span corruption is a good defense against data memorization.

Anthology ID:: 2022.privatenlp-1.4
Volume:: Proceedings of the Fourth Workshop on Privacy in Natural Language Processing
Month:: July
Year:: 2022
Address:: Seattle, United States
Editors:: Oluwaseyi Feyisetan, Sepideh Ghanavati, Patricia Thaine, Ivan Habernal, Fatemehsadat Mireshghallah
Venue:: PrivateNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 21–21
Language:
URL:: https://aclanthology.org/2022.privatenlp-1.4
DOI:: 10.18653/v1/2022.privatenlp-1.4
Bibkey:
Cite (ACL):: Natalia Ponomareva, Jasmijn Bastings, and Sergei Vassilvitskii. 2022. Training Text-to-Text Transformers with Privacy Guarantees. In Proceedings of the Fourth Workshop on Privacy in Natural Language Processing, pages 21–21, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):: Training Text-to-Text Transformers with Privacy Guarantees (Ponomareva et al., PrivateNLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.privatenlp-1.4.pdf
Video:: https://aclanthology.org/2022.privatenlp-1.4.mp4
Data: C4, GLUE, QNLI

PDF Cite Search Video