When differential privacy meets NLP: The devil is in the detail

Differential privacy provides a formal approach to privacy of individuals. Applications of differential privacy in various scenarios, such as protecting users’ original utterances, must satisfy certain mathematical properties. Our contribution is a formal analysis of ADePT, a differentially private auto-encoder for text rewriting (Krishna et al, 2021). ADePT achieves promising results on downstream tasks while providing tight privacy guarantees. Our proof reveals that ADePT is not differentially private, thus rendering the experimental results unsubstantiated. We also quantify the impact of the error in its private mechanism, showing that the true sensitivity is higher by at least factor 6 in an optimistic case of a very small encoder’s dimension and that the amount of utterances that are not privatized could easily reach 100% of the entire dataset. Our intention is neither to criticize the authors, nor the peer-reviewing process, but rather point out that if differential privacy applications in NLP rely on formal guarantees, these should be outlined in full and put under detailed scrutiny.


Introduction
The need for NLP systems to protect individuals' privacy has led to the adoption of differential privacy (DP). DP methods formally guarantee that the output of the algorithm will be 'roughly the same' regardless of whether or not any single individual is present in the central dataset; this is achieved by employing randomized algorithms (Dwork and Roth, 2013). Local DP, a variant of DP, mitigates the need for a central dataset and applies randomization on each individual's datapoint. Local DP thus guarantees that its output for an individual A will be 'almost indistinguishable' from the output of any other individuals B or C. 1 This level of privacy protection makes local DP an ideal framework for NLP applications that operate on sensitive user input which should not be collected and processed globally by an untrusted party, e.g., users' verbatim utterances. When the utterances are 'privatized' by local DP, any future post-processing or adversarial attack cannot reveal more than allowed by the particular local DP algorithm's properties (namely the ε parameter; see later Sec. 2).
ADePT, a local DP algorithm recently published at EACL by Krishna et al. (2021) from Amazon Alexa, proposed a differentially private autoencoder for text rewriting. In summary, ADePT takes an input textual utterance and re-writes it in a way such that the output satisfies local DP guarantees. Unfortunately, a thorough formal analysis reveals that ADePT is in fact not differentially private and the privatized data do not protect privacy of individuals as formally promised.
In this short paper, we shed light on ADePT's main argument, the privacy mechanism. We briefly introduce key concepts from differential privacy (DP) and present a detailed proof of the Laplace mechanism (Sec. 2). Section 3 introduces ADePT's (Krishna et al., 2021) architecture and its main privacy argument. We formally prove that the proposed ADePT's mechanism is in fact not differentially private (Sec. 4) and determine the actual sensitivity of its private mechanism (Sec. 5). We sketch to which extent ADePT breaches privacy as opposed to the formal DP guarantees (Sec. 6) and discuss a potential adversary attack (Appendix C).

Theoretical background
From a high-level perspective, DP works with the notion of individuals whose information is contained in a database (dataset). Each individual's datapoint (or record), which could be a single bit, a number, a vector, a structured record, a text document, or any arbitrary object, is considered private and cannot be revealed. Moreover, even whether or not any particular individual A is in the database is considered private.
Definition 2.1. Let X be a 'universe' of all records and x, y ∈ X be two datasets from this universe. We say that x and y are neighboring datasets if they differ in one record.
For example, let dataset x consist of |x| documents where each document is associated with an individual whose privacy we want to preserve. Let y differ from x by one document, so either |y| = |x| ± 1, or |y| = |x| with i-th document replaced. Then by definition 2.1, x and y are neighboring datasets.
Global DP and queries In a typical setup, the database is not public but held by a trusted curator. Only the curator can fully access all datapoints and answer any query we might have, for example how many individuals are in the database, whether or not B is in there, what is the most common disease (if the database is medical), what is the average length of the documents (if the database contains texts), and so on. The types of queries are taskspecific, and we can see them simply as functions with arbitrary domain X and co-domain Z. In this paper, we focus on a simple query type, the numerical query, that is a function with co-domain in R n .
For example, consider a dataset x ∈ X containing textual documents and a numerical query f : X → R that returns an average document length. Let's assume that the length of each document is private, sensitive information. Let the dataset x contain a particular individual A whose privacy we want to breach. Say we also have some leaked background information, in particular a neighboring dataset y ∈ X that contains all datapoints from x except for A. Now, if the trusted curator returned the true value of f (x), we could easily compute A's document length, as we know f (y), and thus we could breach A's privacy. To protect A's privacy, we will employ randomization.
Definition 2.2. Randomized algorithm M : X → Z takes an input value x ∈ X and outputs a value z ∈ Z nondeterministically, e.g., by drawing from a certain probability distribution.
Typically, randomized algorithms are parameterized by a density (for z ∈ R n ) or a discrete distribution (for categorical or binary z). The randomized algorithm 'perturbs' the input by drawing from that distribution. We suggest to consult (Igamberdiev and Habernal, 2021) for yet another NLP introduction to differential privacy. Definition 2.3. Randomized algorithm M satisfies (ε,0)-differential privacy if and only if for any neighboring datasets x, y ∈ X from the domain of M, and for any possible output z ∈ Z from the range of M, it holds where Pr[.] denotes probability 2 and ε ∈ R + is the privacy budget. A smaller ε means stronger privacy protection, and vice versa (Wang et al., 2020;Dwork and Roth, 2013).
In words, to protect each individual's privacy, DP adds randomness when answering queries such that the query results are 'similar' for any pair of neighboring datasets. For our example of the average document length, the true average length would be randomly 'noisified'.
Another view on (ε, 0)-DP is when we treat M(x) and M(y) as two probability distributions. Then (ε, 0)-DP puts upper bound ε on Max Divergence D ∞ (M(x)||M(y)), that is the maximum 'difference' of any output of two random variables. 3 Differential privacy has also a Bayesian interpretation, which compares the adversary's prior with the posterior after observing the values. The odds ratio is bounded by exp(ε), see (Mironov, 2017, p. 266).
Neighboring datasets and local DP The original definition of neigboring datasets (Def. 2.1) is usually adapted to a particular scenario; see (Desfontaines and Pejó, 2020) for a thorough overview. So far, we have shown the global DP scenario with a trusted curator holding a database of |x| individuals. The size of the database can be arbitrary, even containing a single individual, that is |x| = 1. In this case, we say a dataset y ∈ X is neighboring if it contains another single individual (y ∈ X , |y| = 1). This setup allows us to proceed without the trusted curator, as each individual queries its single record and returns differentially private output; this scenario is known as local DP.
In local differential privacy, where there is no central database of records, any pair of data points (examples, input values, etc.) is considered neighboring (Wang et al., 2020). This also holds for ADePT: using the DP terminology, any two utterances x, y are neighboring datasets (Krishna et al., 2021).
Definition 2.4. Let x, y ∈ X be neighboring datasets. The 1 -sensitivity of a function f : X → R n is defined as Dwork and Roth, 2013, p. 31). Definition 2.5. Laplace density with scale b centered at µ is defined as Definition 2.6. Laplace randomized algorithm (Dwork and Roth, 2013, p. 32). Given any function f : X → R n , the Laplace mechanism is defined as where Y i are i.i.d. random variables drawn from a Laplace distribution An analogous definition centers the Laplace noise directly at the function's output, that is From Definition 2.6 also immediately follows that at point z ∈ R n , the density value of the Laplace mechanism p(M Theorem 2.1. The Laplace randomized algorithm preserves (ε, 0)-DP (Dwork and Roth, 2013).
As ADePT relies on the proof of the Laplace mechanism, we show the full proof in Appendix A.

ADePT by Krishna et al. (2021)
Let u be an input text (a sequence of words or a vector, for example; this is not key to the main argument). Enc is an encoder function from input u to a latent representation vector r ∈ R n where n is the number of dimensions of that latent space. Dec is a decoder from the latent representation back to the original input space (again, a sequence of words or a vector). What we have so far is a standard auto-encoder, such that r = Enc(u) and v = Dec(r).
Krishna et al. (2021) define ADePT as a randomized algorithm that, given an input u, generates v as v = Dec(r ), where r ∈ R n is a clipped latent representation vector with added noise where η ∈ R n , C ∈ R is an arbitrary clipping constant, and . 2 is an 2 (Euclidean) norm de- et al., 2021) If η is a multidimensional noise, such that each element η i is independently drawn from a distribution shown in equation 10, then the transformation from u → v is (ε, 0)-DP.
Proof. Krishna et al. (2021) refers to the proof of Theorem 3.6 by Dwork and Roth (2013, p. 32), which is the proof of the Laplace mechanism.
First, v i in Eq. 10 is ambiguous as it 'semantically' relates to v which is the decoded vector that comes first after drawing a random value; moreover η and v have different dimensions. Given that the authors employ Laplacian noise and base their proofs on Theorem 3.6 from Dwork and Roth (2013, p. 32), we believe that Eq. 10 is the standard Laplace mechanism such that each value η i is drawn independently from a zero-centered Laplacian noise parametrized by scale b (Definition 2.6). Given the density from Eq. 3, we rewrite Eq. 11 as Thus by plugging the sensitivity ∆f from Theorem 3.2 into Eq. 12, we obtain  (2021) is that if each η i is drawn from Laplacian distribution with scale 2C ε , their mechanism is (ε, 0) differentially private.

ADePT with Laplace mechanism is not differentially private
Proof. Following the proof of Theorem 2.1, the following bound (Eq. 33) must hold for any x, y and thus this inequality must hold too Fix the clipping constant C > 0 arbitrarily (C ∈ R), set dimensions to n = 2. Let r y = ( 2 3 C, 2 3 C) be the input y of the clipping function f from Eq. 13. f (y) = r y · min 1, C r y 2 (from Eq. 13) = r y · min (1, 1.06066...) = r y · 1 = 2C 3 , 2C 3 (18) 4 We contacted the authors several times to double check that this formula is correct without a potential typo but got no response. However other parts of the paper give evidence it is correct, e.g., the authors use an analogy to a hyper-sphere which is considered euclidean by default.
In general, it is the inequality x 1 ≥ x 2 that makes ADePT fail the DP proof.

Actual sensitivity of ADePT
Theorem 5.1. Let f : R n → R n be a function as defined in Eq. 13. The sensitivity ∆f of this function is 2C √ n.
Proof. See Appendix B.
Corollary 5.1. Since 2C √ n = 2C only for n = 1, ADePT could be differentially private only if the encoder's latent representation r = Enc(u) were a single scalar.
Since Krishna et al. (2021) do not specify the dimensionality of their encoder's output, we can only assume some typical values in a range from 32 to 1024, so that the true sensitivity of ADePT is ≈ 6 to 32 times higher than reported.
6 Magnitude of non-protected data How many data points actually violate the privacy guarantees? Without having access to the trained model and its hyper-parameters (C, in particular), it is hard to reason about properties of the latent space, where privatization occurs. We thus simulated the encoder's 'unclipped' vector outputs r by sampling 10k vectors from two distributions: 1) uniform within (−C, +C) for each dimension, and 2) zero-centered normal with σ 2 = 0.1 · C. Especially the latter one is rather optimistic as it samples most vectors close to zero. In reality these latent space vectors are unbounded.
Each pair of such vectors in the latent space after clipping but before applying DP (Eq. 13) is 'neighboring datasets' so their 1 distance must be bound by sensitivity (2C as claimed in Theorem 3.2) in order to satisfy DP with the Laplace mechanism.
We ran the simulation for an increasing dimensionality of the encoder's output and measured how many pairs violate the sensitivity bound. 5 Fig. 1 shows the 'curse of dimensionality' for norms. Even for a considerably small encoder's vector size of 32 and unbounded encoder's latent space, almost none of the data points would be protected by ADePT's Laplace mechanism.

Discussion
Local DP differs from centralized DP in such a way that there is no central database and once the privatized data item 'leaves' an individual, it stays so forever. This makes typical membership inference attacks unsuitable, as no matter what happens to the rest of the world, the probability of inferring the individual's true value after observing their privatized data item is bounded by exp(ε). For example, the ATIS dataset used in ADePT contains 5,473 utterances of lengths 1 to 46 tokens, with a quite limited vocabulary of 941 words. In theory, the search space of all possible utterances would be of size 941 46 ≈ 6 × 10 136 , and under ε-DP all of them are multiplicatively indistinguishable -for example, after observing "on april first i need a ticket from tacoma to san jose departing 5 Code available at https://github.com/habernal/ emnlp2021-differential-privacy-nlp before 7 am" from ADePT's autoencoder privatized output, the true input might well have been "on april first i need a flight going from phoenix to san diego" or "monday morning i would like to fly from columbus to indianapolis" and our posterior certainty of any of those is limited by the privacy bound. However, since outputs of ADePT are leaking privacy, attacks are possible. We sketch a potential scenario in Appendix C.
There are two possible remedies for ADePT. Either the latent vector clipping in Eq. 9 could use 1norm, or the Laplacian noise in Eq. 10 could use the correct sensitivity as determined in Theorem 5.1. In either case, the utility in the downstream tasks as presented by Krishna et al. (2021) are expected to be worse due to a much larger amount of required noise.

Conclusion
This paper revealed a potential trap for NLP researchers when adopting a local DP approach. We believe it contributes to a better understanding of the exact modeling choices involved in determining the sensitivity of local DP algorithms. We hope that DP will become a widely accessible and wellunderstood framework within the NLP community.
Proof is directly based on the triangle inequality.
Corollary A.1. Definition 2.4 implies that ∆f is an upper bound value on the 1 norm of the function output for any neighboring x and y. In other words The actual proof (Dwork and Roth, 2013). We will prove that for any x, y the following ratio is bounded by exp(ε) and thus satisfies Definition 2.3. Fix z ∈ R n arbitrarily. By plugging Eq. 7 into Eq. 27, we get which is what we wanted. By symmetry we get the proof for p(M L (y,f,ε)=z) p(M L (x,f,ε)=z) ≤ exp(ε).

B Proof of Theorem 5.1
Proof. The definition of sensitivity corresponds to the maximum 1 distance of any two vectors R n from the range of f . As Eq. 13 bounds all vectors to their 2 (Euclidean) norm, we want to find the distance between two opposing points on an ndimensional sphere that have maximal 1 distance. Let n ∈ N > 0 be the number of dimension and C ∈ R a positive constant. We solve the following optimization problem max x 1 , . . . , x n f (x 1 , . . . , x n ) = |x 1 | + · · · + |x n | s.t.
Now let x, x ∈ R n such that they have maximum 1 norm (Eq. 36) and their 2 norm is C (that is the output of function f after clipping in Eq. 13) Then their 1 distance is

C Potential attacks
Here we only sketch a potential attack on a single individual's privatized output v. We do not speculate on the actual feasibility as differentiall privacy operates with the worst case scenario, that is the theoretical possibility that the adversary has unlimited compute power and unlimited background knowledge. However, real life examples show that anything less protective than DP can be attacked and it is mostly a matter of resources. 6 We expect to have access to the trained ADePT autoencoder as well as the ATIS corpus (without the single individual whose value we try to infer, to be fair). We would need to find the privatized latent vector of v, that is r , which could be possible by exploiting and probing the model. Second, by employing a brute-force attack, we can train a LM on ATIS to generate a feasible search space of input utterances, project them to the latent space, and explore the neighborhood of r . This would drastically reduce the search space. Then, depending on the geometric properties of that latent space, it might be the case that 'similar' utterances are closer to each other, increasing the probability of finding a similar utterance which might be a 'just good enough' approximation for the adversary.