A Differentially Private Text Perturbation Method Using Regularized Mahalanobis Metric

Zekun Xu, Abhinav Aggarwal, Oluwaseyi Feyisetan, Nathanael Teissier


Abstract
Balancing the privacy-utility tradeoff is a crucial requirement of many practical machine learning systems that deal with sensitive customer data. A popular approach for privacy- preserving text analysis is noise injection, in which text data is first mapped into a continuous embedding space, perturbed by sampling a spherical noise from an appropriate distribution, and then projected back to the discrete vocabulary space. While this allows the perturbation to admit the required metric differential privacy, often the utility of downstream tasks modeled on this perturbed data is low because the spherical noise does not account for the variability in the density around different words in the embedding space. In particular, words in a sparse region are likely unchanged even when the noise scale is large. In this paper, we propose a text perturbation mechanism based on a carefully designed regularized variant of the Mahalanobis metric to overcome this problem. For any given noise scale, this metric adds an elliptical noise to account for the covariance structure in the embedding space. This heterogeneity in the noise scale along different directions helps ensure that the words in the sparse region have sufficient likelihood of replacement without sacrificing the overall utility. We provide a text-perturbation algorithm based on this metric and formally prove its privacy guarantees. Additionally, we empirically show that our mechanism improves the privacy statistics to achieve the same level of utility as compared to the state-of-the-art Laplace mechanism.
Anthology ID:
2020.privatenlp-1.2
Volume:
Proceedings of the Second Workshop on Privacy in NLP
Month:
November
Year:
2020
Address:
Online
Editors:
Oluwaseyi Feyisetan, Sepideh Ghanavati, Shervin Malmasi, Patricia Thaine
Venue:
PrivateNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7–17
Language:
URL:
https://aclanthology.org/2020.privatenlp-1.2
DOI:
10.18653/v1/2020.privatenlp-1.2
Bibkey:
Cite (ACL):
Zekun Xu, Abhinav Aggarwal, Oluwaseyi Feyisetan, and Nathanael Teissier. 2020. A Differentially Private Text Perturbation Method Using Regularized Mahalanobis Metric. In Proceedings of the Second Workshop on Privacy in NLP, pages 7–17, Online. Association for Computational Linguistics.
Cite (Informal):
A Differentially Private Text Perturbation Method Using Regularized Mahalanobis Metric (Xu et al., PrivateNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.privatenlp-1.2.pdf
Video:
 https://slideslive.com/38939770