Differential Privacy for Text Analytics via Natural Text Sanitization

Texts convey sophisticated knowledge. However, texts also convey sensitive information. Despite the success of general-purpose language models and domain-specific mechanisms with differential privacy (DP), existing text sanitization mechanisms still provide low utility, as cursed by the high-dimensional text representation. The companion issue of utilizing sanitized texts for downstream analytics is also under-explored. This paper takes a direct approach to text sanitization. Our insight is to consider both sensitivity and similarity via our new local DP notion. The sanitized texts also contribute to our sanitization-aware pretraining and fine-tuning, enabling privacy-preserving natural language processing over the BERT language model with promising utility. Surprisingly, the high utility does not boost up the success rate of inference attacks.


Introduction
Natural language processing (NLP) requires a lot of training data, which can be sensitive. Naïve redaction approaches (e.g., removing common personally identifiable information) is known to fail (Sweeney, 2015): innocuous-looking fields can be linked to other information sources for reidentification. The recent success of many language models (LMs) has motivated security researchers to devise advanced privacy attacks. Carlini et al. (2020b) recover texts from (a single document of) the training data via querying to an LM pretrained from it. Pan et al. (2020) and Song and Raghunathan (2020) target the text embedding, e.g., revealing from an encoded query to an NLP service. Figure 1: Workflow of our PPNLP pipeline, including the user-side sanitization and the service provider-side NLP modeling with pretraining/fine-tuning Emerging NLP works focus on only specific document-level (statistical) features (Weggenmann and Kerschbaum, 2018) or producing private text representations (Xie et al., 2017;Coavoux et al., 2018;Elazar and Goldberg, 2018; as initial solutions to the first issue above on trainingdata privacy. However, the learned representations are not human-readable, which makes transparency (e.g., required by GDPR) questionable: an average user may not have the technical know-how to verify whether sensitive attributes have been removed or not. Moreover, consider the whole NLP pipeline, the learned representations often entail extra modeling or non-trivial changes to existing NLP models, which take dedicated engineering efforts.
incur minimal changes to existing NLP pipelines. Being human-readable, they provide transparency (to privacy-concerning training-data contributors) and explainability (e.g., to linguists who might find the need for investigating how the training data contribute to a certain result). Moreover, it naturally extends the privacy protection to the inference phase. Users can apply our sanitization mechanism before sending queries (e.g., medical history) to the NLP service provider (e.g., diagnosis services).
Conceptually, we take a natural approach -we sanitize text documents into also (sanitized) text documents. This is in great contrast to the typical "post-processing" for injecting noises either to gradients in training (a deep neural network) (McMahan et al., 2018) or the "cursed" high-dimensional text representations (Lyu et al., 2020a,b;Feyisetan et al., 2020). It also leads to our O(1) efficiency, freeing us from re-synthesizing the document wordby-word via nearest neighbor searches over the entire vocabulary space V (Feyisetan et al., 2020).
Technically, we aim for the de facto standard of local differential privacy (LDP) (Duchi et al., 2013) to sanitize the user data locally, based on which the service provider can build NLP models without touching any raw data. DP has been successful in many contexts, e.g., location privacy and survey statistics Murakami and Kawamoto, 2019). However, DP text analytics appears to be a difficult pursuit (as discussed, also see Section 2), which probably explains why there are only a few works in DP-based text sanitization. In high-level terms, text is rich in semantics, differentiating it from other more structured data.
Our challenge here is to develop efficient and effective mechanisms that preserve the utility of the text data with provable and quantifiable privacy guarantees. Our insight is the formulation of a new LDP notion named Utility-optimized Metric LDP (UMLDP). We attribute our success to the focus of UMLDP on protecting what matters (sensitive words) via "sacrificing" the privacy of nonsensitive (common) words. To achieve UMLDP, our mechanism directly samples noises on tokens.
Our result in this regard is already better than the state-of-the-art LDP solution producing sanitized documents (Feyisetan et al., 2020) -we got 28% gain in accuracy on the SST-2 dataset (Wang et al., 2019) on average at the same privacy level (i.e., the same LDP parameter) while being much more efficient (∼60× faster, precomputation included).

Privacy-Preserving NLP, Holistically
Text sanitization is essential but just one piece of the whole privacy-preserving NLP (PPNLP) pipeline. While most prior works in text privacy are motivated by producing useful data for some downstream tasks, the actual text analytics are hardly explored, not to say in the context of many recent general-purpose language models. As simple as it might seem, we start to see design choices that can be influential. Specifically, our challenge here is to adapt the currently dominating pretraining-finetuning paradigm (e.g., BERT (Devlin et al., 2019)) over sanitized texts for building the model.
Our design is to build in privacy at the root again, in contrast to the afterthought approach. We found it beneficial to sanitize even the public data before feeding them to training. It is not for protecting the public data per se. The intuition here is that it "prepares" the model to work with sanitized queries, which explains our eventual (slight) increase in accuracy while additionally ensuring privacy.
Specifically, we propose a sanitization-aware pretraining procedure (Figure 1). We first use our mechanisms to sanitize the public texts, mask the sanitized texts (as in BERT), and train the LM by predicting a MASK position as its original unsanitized token. LMs preptrained with our sanitizationaware procedure are expected to be more robust to noises in the sanitized texts and achieve better utility when fine-tuning on downstream tasks.
We conduct experiments on three representative NLP tasks to empirically confirm that our proposed PPNLP pipeline preserves both utility and privacy. It turns out that our sanitization-based pretraining (using only 1/6 of data used in the original BERT pretraining) can even improve the utility of NLP tasks while maintaining privacy comparable to the original BERT. Note that there is an inherent tension between utility and privacy, and privacy attack is also inference in nature. To empirically demonstrate the privacy aspect of our pipeline, i.e., it does not make our model a more powerful tool helping the attacker, we also conduct the "mask token inference" attack on private texts, which infers the masked token given its context based on BERT. As a highlight, our base solution SANTEXT improves the defense rate by 20% with only a 4% utility loss on the SST-2 dataset. We attribute our surprising result of mostly helping only good guys to our natural approach: to avoid the model memorizing sensitive texts "too well," we fed it with sanitized text.

Related Work
Privacy risks in NLP. A taxonomy of attacks that recover sensitive attributes or partial raw text from text embeddings output by popular LMs has been proposed (Song and Raghunathan, 2020), without any assumptions on the structures or patterns in input text. Carlini et al. (2020b) also show a powerful black-box attack on GPT-2 (Radford et al., 2019) that extracts verbatim texts of training data. Defense with rigorous guarantees (DP) is thus vital.
Differential privacy and its application in NLP. DP (Dwork, 2006) has emerged as the de facto standard for statistical analytics (Wang et al., 2017(Wang et al., , 2018Cormode et al., 2018). A few efforts inject high-dimensional DP noise into text representations (Feyisetan et al., 2019(Feyisetan et al., , 2020Lyu et al., 2020a,b). The noisy representations are not humanreadable and not directly usable by existing NLP pipelines, i.e., they consider a different problem not directly comparable to ours. More importantly, they fail to strike a nice privacy-utility balance due to "the curse of dimensionality," i.e., the magnitude of the noise is too large for high-dimensional token embedding, and thus it becomes exponentially less likely to find a noisy embedding close to a real one on every dimension. This may also explain why an earlier work focuses on document-level statistics only, e.g., term-frequency vectors (Weggenmann and Kerschbaum, 2018).
Our approaches produce natively usable sanitized texts via directly sampling a substitution for each token from a precomputed distribution (to be detailed in Section 4), circumventing the dimension curse and striking a privacy-utility tradeoff while being much more efficient. A concurrent work (Qu et al., 2021) also considers the whole NLP pipeline, but it still builds on the tokenprojection approach (Feyisetan et al., 2020).
Privacy-preserving text representations. Learning private text representations via adversarial training is also an active area (Xie et al., 2017;Coavoux et al., 2018;Elazar and Goldberg, 2018;). An adversary is trained to infer sensitive information jointly with the main model, while the main model is trained to maximize the adversary's loss and minimize the primary learning objective. While we share the same general goal, our aim is not such representations (similar to those with DP) but to release sanitized text for general purposes.

Defining (Local) Differential Privacy
Suppose each user holds a document D = x i L i=1 of L tokens (which can be a character, a subword, a word, or an n-gram), where x i is from a vocabulary V of size |V|. For privacy, each user derives a sanitized versionD by running a common text sanitization mechanism M over D on local devices. Specifically, M works by replacing every token x i in D with a substitution y i ∈ V, assuming that x i itself is unnecessary for NLP tasks while its semantics should be preserved for high utility. The output D is then shared with an NLP service provider.
We consider a typical threat model in which each user does not trust any other party and views them as an attacker with access toD in conjunction with any auxiliary information (including M).

(Variants of) Local Differential Privacy
Let X and Y be the input and output spaces. A randomized mechanism M : X → Y is a probabilistic function that assigns a random output y ∈ Y to an input x ∈ X . Every y induces a probability distribution on the underlying space. For sanitizing text, we set both X and Y as the vocabulary V.
Definition 1 ( -LDP (Duchi et al., 2013)). Given a privacy parameter ≥ 0, M satisfies -local differential privacy ( -LDP) if, for any x, x , y ∈ V, Given an observed output y, from the attacker's view, the likelihoods y is derived from x and x are similar. A smaller means better privacy due to a higher indistinguishability level of output distributions, yet the outputs retain less utility.
-LDP is a very strong privacy notion for its homogeneous protection over all input pairs. However, this is also detrimental to the utility: no matter how unrelated x and x are, their output distributions must be similar. As a result, a sanitized token y may not (approximately) capture the semantics of its input x, degrading the downstream tasks.
Definition 2 (MLDP). Given ≥ 0 and a distance metric d : V × V → R ≥0 over V, M satisfies MLDP or · d(x, x )-LDP if, for any x, x , y ∈ V, When d(x, x ) = 1 ∀x = x , MLDP becomes LDP. For MLDP, the indistinguishability of output distributions is further scaled by the distance between the respective inputs. Roughly, the effect of becomes "adaptive." To apply MLDP, one needs to carefully define the metric d (see Section 4.2). Incorporating ULDP to further improve utility. Utility-optimized LDP (Murakami and Kawamoto, 2019) (ULDP) also relaxes LDP, which was originally proposed for aggregating ordinal responses. It exploits the fact that different inputs have different sensitivity levels to achieve higher utility. By assuming that the input space is split into sensitive and non-sensitive parts, ULDP achieves a privacy guarantee equivalent to LDP for sensitive inputs.
In our context, more formally speaking, let V S ⊆ V be the set of sensitive tokens common to all users, and V N = V \ V S be the set of remaining tokens. The output space V is split into the protected part V P ⊆ V and the unprotected part The image of V S is restricted to V P , i.e., a sensitive x ∈ V S can only be mapped to a protected y ∈ V P . For text, we can set V S = V P for simplicity. While a non-sensitive x ∈ V N can be mapped to V P , every y ∈ V U must be mapped from V N , which helps to improve the utility.

Our New Utility-optimized MLDP Notion
Among many variants of (L)DP notions, we found the above two variants (i.e., ULDP and MLDP) provide useful insight in quantifying semantics and privacy of text data. We thus formulate the new privacy notion of utility-optimized MLDP (UMLDP). Definition 3 (UMLDP). Given V S ∪ V N = V, two privacy parameters , 0 ≥ 0, and a distance met- Figure 2 summarizes the treatment of UMLDP. It exhibits "invertibility," i.e., y ∈ V U must be "noisefree" and mapped deterministically. Apart from ( + ! )-LDP

Invertible map
Figure 2: Overview of our new UMLDP notion generalizing in the ULDP definition (recalled in Appendix A.1) into d(x, x ), we incorporate an additive bound 0 due to the invertibility, which makes the derivation of easier. Looking ahead, 0 would appear naturally in the analysis of our UMLDP mechanism for the invertible case.
UMLDP (and MLDP), as an LDP notion, satisfies the composability and free post-processing. The former means that the sequential execution of 1 -LDP and 2 -LDP mechanisms satisfies ( 1 + 2 )-LDP, i.e., can be viewed as the privacy "budget" of a sophisticated task comprising multiple subroutines, each consumes a part of such that their sum equals . The latter means further processing the mechanism outputs incurs no extra privacy loss.

Overview
We propose two token-wise sanitization methods with (U)MLDP: SANTEXT and SANTEXT + , which build atop a variant of the exponential mechanism (EM) (McSherry and Talwar, 2007) over the "native" text tokens as both input and output spaces to avoid going to the "cursed dimensions" of token embeddings. EM samples a replacement y for an input x based on an exponential distribution, with more "suitable" y's sampled with higher probability (detailed below). It is well-suited for (U)MLDP by considering the "suitability" as how well the semantics of x is preserved for the downstream tasks (run over the sanitized text y) to remain accurate.
To quantify this, we utilize an embedding model mapping tokens into a real-valued vector space. The semantic similarity among tokens can then be measured via the Euclidean distance between their corresponding vectors. Our base design SANTEXT outputs y with probability inverse proportional to the distance between x and y: the shorter the distance, the more semantically similar they are. SAN-TEXT + considers some tokens V N in V are nonsensitive, and runs SANTEXT over the sensitive part V S = V \ V N (i.e., it degenerates to SAN-TEXT if V S = V). For V N , we tailor a probability distribution to provide UMLDP as a whole.
, and a privacy parameter ≥ 0 Output: Sanitized documentD 1 Derive token vectors φ(x i ) for i ∈ [1, L]; 2 for i = 1, . . . , L do 3 Run M(x i ) to sample a sanitized token y i with probability defined in Eq. (1); 4 end 5 Output sanitizedD as y i L i=1 ; With SANTEXT or SANTEXT + , each user sanitizes D intoD and uploads it to the service provider for performing any NLP task built atop a pretrained LM, e.g., BERT. Typically, the task pipeline consists of an embedding layer, an encoder module, and task-specific layers, e.g., for classification.
Without the raw text, the utility can degrade; we thus propose two approaches for improving it. The first one is to pretrain only the encoder on the sanitized public corpus to adapt to the noise. It is optional if pretraining is deemed costly. The second is to fine-tune the full pipeline onD's, which updates both the encoder and task layers.

Base Sanitization Mechanism: SANTEXT
In NLP, a common step is to employ an embedding model 1 mapping semantically similar tokens to close vectors in a Euclidean space. Concretely, an embedding model is an injective mapping φ : V → R m , for dimensionality m. The distance between any two tokens x and x can be measured by the Euclidean distance of their embeddings: d(x, x ) = d euc (φ(x), φ(x )). As φ is injective, d satisfies the axioms of a distance metric.
Algorithm 1 lists the pseudo-code of SANTEXT for sanitizing a private document D at the user side. The first step is to use φ to derive token embeddings of each token 2 x in D. Then, for each x, we run M(x) to sample a sanitized y with probability , the more likely y is to replace x. To boost the sanitizing efficiency, we can precompute a |V| × |V| probability matrix, where each entry (i, j) denotes the probability of outputting y j on input x i , upon obtaining φ(x) for 1 We assume that it has been trained on a large public corpus and shared by all users.
2 For easy presentation, we omit the subscript i later.
, a privacy parameter ≥ 0, probability p for a biased coin, and sensitive V S Output: Sanitized documentD 1 Derive token vectors φ( Sample a substitution y i ∈ V P = V S with probability given in Eq. (1) Run SANTEXT over V S and V P ; can be released to the service provider for NLP tasks.

Enhanced Mechanism: SANTEXT +
In SANTEXT, all tokens in V are treated as sensitive, which leads to excessive protection and utility loss. Following the less-is-more principle, we divide V into V S and V N , and focus on protecting V S .
Observing that most frequently used tokens (e.g., a/an/the) are non-sensitive to virtually all users, we use token frequencies for division. A simple strategy, which is also used in our experiments, is to mark the top w of low-frequency tokens (according to a certain corpus) as V S , where w is a tunable parameter. Looking ahead, this "basic" method already showed promising results. (Further discussion can be found in Section 4.5).
Algorithm 2 lists the pseudo-code of SANTEXT + with V S = V P and V N = V U shared by all users. The first step, as in SANTEXT, is to derive the token embeddings in D. Then, for each token x, if it is in V S , we sample its substitution y from V P with probability given in Eq. (1). (This is equivalent to running SANTEXT over V S and V P .) For x ∈ V N , we toss a biased coin. With probability (1 − p), we output y as x (i.e., the "invertibility"). Otherwise, we sample y ∈ V P with probability where C x = ( y ∈V P e − 1 2 ·deuc(φ(x),φ(y )) ) −1 . As in SANTEXT, we can also precompute two |V S | × |V P | and |V N | × |V P | probability matrices, which correspond to Eq. (1) and (2), for optimizing the sanitizing efficiency. Lastly, the sanitizedD of y L i=1 can be released to the service provider. Theorem 1. Given ≥ 0 and d euc over the embedding space φ of V, SANTEXT satisfies MLDP.
Their proofs are in Appendix A.2.

NLP over Sanitized Text
WithD's (shared by the users), the service provider can perform any NLP task. In this work, we focus on those built on a pretrained LM, and in particular, we study BERT as an example due to its wide adoption and superior performance. The full NLP pipeline is deployed at the service provider.
Given a piece of (sanitized) text, the embedding layer maps it to a sequence of token embeddings. The encoder computes a sequence representation from the token embeddings, allowing task-specific layers to make predictions. For example, the task layer could be a feed-forward neural network for multi-label classification of a diagnosis system.
The injected noise deteriorates the performance of downstream tasks as the service provider cannot access the raw texts {D}. To mitigate this, we propose two approaches -pretraining the encoder and fine-tuning the full pipeline, which allow the tasks to be "adaptive" to the noise to some extent.
Pretraining BERT over sanitized public corpus. BesidesD's, the service provider can also obtain a massive amount of text that is publicly available (say, the English Wikipedia). It also has access to the sanitization mechanisms, and it can produce the sanitized public text (as how users produceD's).
Our key idea is to let the service provider pretrain the encoder (i.e., BERT) over the sanitized public text, making it more "robust" in handlingD's. We thus initialize the encoder with the original BERT checkpoint and conduct further pretraining with an adapted masked language model (MLM) loss. In more detail, the adapted MLM objective is to predict the original masked tokens given the sanitized context instead of the one from the raw public text. We note that this is beneficial for improving the task utility, yet may breach the user privacy as the objective learns to "recover" the original tokens or semantics. In Section 5.4, our results will show that such pretrained BERT indeed improves accuracy, with comparable privacy as in original BERT.
Fine-tuning the full NLP pipeline. After pretraining BERT using sanitized public text, the service provider can further improve the efficacy of downstream tasks by fine-tuning the full pipeline. We assume that the ground-truth labels are available to the service provider, say, inferring fromD's when they can preserve similar semantics to the raw text. Then, the sanitized text-label pairs are used for training/fine-tuning downstream task models, with gradients back-propagated to update the parameters of both the encoder and task layer. We leave more realistic/complex labeling processes based on sanitized texts as future work.

Definition of "Sensitivity"
Simply treating the top w of least frequent tokens (e.g., according to a public reference corpus) as the sensitive token set already led to promising results (see Section 5.2). By this definition, stop words are mostly non-sensitive (e.g., for w = 0.9 over the sentiment classification dataset we used, ∼98% of the stop words are deemed non-sensitive). For context-specific corpus, this strategy is better than merely using stop words, e.g., breast cancer becomes non-sensitive among breast-cancer patients.
Sophisticated machine-learning approaches or other heuristics could also be considered, e.g., training over context-specific reference corpus or identifying tokens with personal (and hence sensitive) information (e.g., names). We leave as future work.
Moreover, the definition of sensitivity may vary across users. Some may consider a token deemed non-sensitive by most other users sensitive. The original ULDP work (Murakami and Kawamoto, 2019) has discussed a personalized mechanism that preprocesses such tokens by mapping them to a set of semantic tags, which are the same for all users. These tags will be treated as sensitive tokens for the ULDP mechanism. Apparently, this approach is application-specific and may not be needed in some applications; hence we omit it in this work.

Experimental Setup
We consider three representative downstream NLP tasks (datasets) with privacy implications. Sentiment Classification (SST-2). When people write online reviews, especially the negative ones, they may worry about having their identity traced via writing too much that may provide hints of authorship or linkage to other online writings. For  Table 1: Utilities comparison of sanitization mechanisms under similar privacy levels using the GloVe embedding this task, we use the preprocessed version in GLUE benchmark (Wang et al., 2019) of (binary) Stanford Sentiment Treebank (SST-2) dataset (Socher et al., 2013). Accuracy (w.r.t. the ground truth included in the dataset) is used as the evaluation metric.
Medical Semantic Textual Similarity (Med-STS). Automated processing of patient records is a significant research direction, and one such task is computing the semantic similarity between clinical text snippets for the benefit of reducing the cognitive burden. We choose a very recent MedSTS dataset (Wang et al., 2020) for this task, which assigns a numerical score to each pair of sentences, indicating the degree of similarity. We report the Pearson correlation coefficient (between predicted similarities and human judgments) for this task.

Question Natural Language Inference (QNLI).
Question-answering (QA) aims to automatically answer user questions based on documents. We consider a simplified setting of QA, namely QNLI, which predicts whether a given document contains the answer to the question. We use the QNLI dataset from GLUE benchmark (Wang et al., 2019). We implement our sanitized mechanisms using Python and the sanitization-aware training using the Transformers library (Wolf et al., 2020). We use sanitized data to train and test prediction models for all three tasks. We either build vocabularies for the tasks using GloVe embeddings (Pennington et al., 2014) or adopt the same BERT vocabulary (Devlin et al., 2019). Table 2 shows their sizes. Our sanitization-aware pretraining uses Wi-kiCorpus (English version, a 2006 dump, 600M words) (Reese et al., 2010). We start from the bert-base-uncased (instead of randomly initialized) model to accelerate the pretraining.
We set the maximum sequence length to 512, training epoch to 1, batch size to 6, learning rate to 5e-5, warmup steps to 2000, and MLM probability to 0.15. Our sanitization-aware fine-tuning uses the bert-base-uncased model for SST-2/QNLI, and ClinicalBERT (Alsentzer et al., 2019) for MedSTS. We set the maximum sequence length Figure 3: Performance of SANTEXT + over (w, p) when fixing = 2 based on the GloVe embedding to 128, training epochs to 3, batch size to 64 for SST-2/QNLI or 8 for MedSTS, and learning rate to 2e-5 for SST-2/QNLI or 5e-5 for MedSTS. Other hyperparameters are kept default. Our hyperparameters followed the transformer library (Wolf et al., 2020) and popular setups in the original dataset literature (Wang et al., 2019(Wang et al., , 2020.

Comparison of Sanitization Mechanisms
We first compare our SANTEXT and SANTEXT + with random sanitization and the state-of-the-art of Feyisetan et al. (FBDD). Here, we use the GloVe embedding as in FBDD for a fair comparison. Random sanitization picks a token from the vocabulary uniformly. We set the UMLDP parameters p = 0.3, w = 0.9 for SANTEXT + (while Figure 3 plots the impacts of p and w when fixing = 2). Table 1 shows the utility of the four mechanisms for the three selected tasks at different privacy levels. FBDD has a higher utility than random replacements. While both FBDD and SANTEXT are based on word embeddings, SANTEXT does not suffer from the "curse-of-dimensionality" and achieves better utility at the same privacy level. SANTEXT + achieves the best utilities in all cases since it allows the non-sensitive tokens to be noise-free, lowering the noise and improving the utility.
In terms of efficiency, our SANTEXT and SAN-TEXT + are very efficient (e.g., ∼2min for the SST-2 dataset) compared with FBDD (∼117min) when they all run on a 24 core CPU machine. This is because our mechanisms only need to compute the sampling probability once and use the same probability matrix for sampling each time, while FBDD needs to recalculate the additive noise and re-search the nearest neighbor each time.

Mask Token Inference Attack
From now on, we adopt the BERT embedding for its superiority. As (U)MLDP is distance-metric dependent, we need to use different 's (e.g., Figure 5) to ensure a similar privacy level, specifically, · d.
Our sanitization mechanisms provide broad protection for seen/unseen attacks at a fundamental level (by sampling noise to directly replace original tokens) with formally-proven DP, e.g., two guesses of the original token with different styles are nearly probable in an attempt of authorship attribution (Weggenmann and Kerschbaum, 2018) or other "indirect" attacks. Here, we consider a mask token inference attack as a representative study to "confirm the theory" by empirically measuring the "concrete" privacy level of sanitized texts.
To infer or recover original tokens given the sanitized text, one can let a pretrained BERT model infer the MASK token given its contexts. After all, BERT models are trained via masked language modeling. For each sanitized text of the downstream (private) corpus, we replace each token sequentially by the special token [MASK] and input the masked text to the pretrained BERT model to obtain the prediction of the [MASK] position. Then, we compare the predicted token to the original token in the raw text. Figure 4 reports the defense rate (the proportion of unmatched tokens to total tokens) and task utility of sanitized texts (by SANTEXT) as well as unsanitized texts on SST-2 and QNLI. We see a privacy-utility trade-off: the more restrictive the privacy guarantee (smaller ), the lower the utility score. Notably, we improve the defense rate substantially with only a small amount of privacy loss (e.g., when = 16, SANTEXT im-  Table 3: Sanitization-aware pretraining via SANTEXT proves the defense rate by 20% with only 4% task utility loss over the SST-2 dataset in Figure 4).

Effectiveness of Pretraining
We then show how the sanitization-aware pretraining further improves the utility but does not hurt the original privacy. Specifically, Table 3 compares the accuracy of sanitization-aware fine-tuning based on the publicly-available bert-base-uncased model and our sanitization-aware pretrained one at different privacy levels on SST-2 and QNLI. Our sanitization-aware pretrained BERT models can obtain a 2% absolute gain on average. We conjecture that it can be improved since our pretraining only uses 1/6 of the data used in the original BERT pretraining and 1 training epoch as an illustration.
To demonstrate that such utility improvement is not obtained by sacrificing privacy, we record the change of defense rate (∆ privacy ) in launching mask token inference attacks on the original BERT models and our sanitization-aware pretrained BERT models. As Table 3 confirmed, the privacy level of our sanitization-aware pretrained model is nearly the same as the original (sometimes even better).

Influence of Privacy Parameter
We aim at striking a nice balance between privacy and utility by tuning . To empirically show the influence of , we report the utility and privacy scores over the SST-2 dataset based on SANTEXT. The utility score is the accuracy over the test set. We define three metrics to "quantify" privacy. Firstly, N x = Pr[M(x) = x], which we estimate by the frequency of seeing no replacement by M(). The output distribution of x has full support over V, i.e., Pr[M(x) = y] > 0 for any y ∈ V. Yet, we are interested in the effective support S, a set of y's with cumulative probability larger than a threshold, and then define S x as its size. S x can be estimated by the number of distinct tokens mapped from x.
Both N x and S x can be related to two extremes of the Rényi entropy (Rényi, 1961), defined as  This implies that we can also approximate H 0 and H ∞ by log S x and − log N x , respectively. Making them large increases the entropy of the distribution.
Another important notion is plausible deniability (Bindschaedler et al., 2017), i.e., a set of x's could have led to an output y with a similar probability. We define S * y as the set size, estimated by the number of distinct tokens mapped to y.
We run SANTEXT 1, 000 times for the whole SST-2 dataset vocabulary. As Figure 5 shows, when increases, the utility boosts and N x increases while S x , S * y , and the privacy level of the mechanism decrease, which gives some intuition in picking , e.g., for ∼40% probability of replacing each token to a different one based on the BERT embeddings (top panel), we could set = 15 since the median of N x is ∼60% and the accuracy is ∼81%.

Conclusion
Great predictive power comes with great privacy risks. The success of language models enables inference attacks. There are only a few works in differentially private (DP) text sanitization, probably due to its intrinsic difficulty. A new approach addressing the (inherent) limitation (e.g., in generality) of existing works is thus needed.
Theoretically, we formulate a new LDP notion, UMLDP, which considers both sensitivity and similarity. While it is motivated by text analytics, it remains interesting in its own right. UMLDP enables our natural sanitization mechanisms without the curse of dimensionality faced by existing works.
Practically, we consider the whole PPNLP pipeline and build in privacy at the root with our sanitization-aware pretraining and fine-tuning. With our simple and clear definition of sensitivity, our work already achieved promising performance. Future research in sophisticated sensitivity measures will further strengthen our approach.
Surprisingly, our PPNLP solution is discerning like a cryptographic solution: it is kind (maintains high utility) to the good but not as helpful to the bad (not boosting up inference attacks). We hope our results with different metrics for quantifying privacy can provide more insights in privacy-preserving NLP and make it accessible to a broad audience.

A Supplementary Formalism Details
A.1 Definition of ULDP Definition 4 ((V S , V P , )-ULDP (Murakami and Kawamoto, 2019)). Given (V S = V P ) ⊆ V, a privacy parameter ≥ 0, M satisfies (V S , V P , )-ULDP if it satisfies the properties: i) for any x, x ∈ V and any y ∈ V P , we have ii) for any y ∈ V U , there is an x ∈ V N such that

A.2 Differential Privacy Guarantee
Proof of Theorem 1. Consider L = 1, i.e., D = x . For another document D with x ∈ V \ {x} and a possible output y ∈ V: The proof, showing SANTEXT ensures · d(x, x )-LDP, mainly relies on the triangle inequality of d.
To generalize to the case of L > 1, we sanitize every token x i in D independently, and thus: Then, for any D, D , the privacy bound is given as which follows from the composability.
Proof of Theorem 2. Consider the case L = 1 with D = x and D = x . For x, x ∈ V S , the output y is restricted to V P , with the proof identical to the above theorem (as SANTEXT is run over V S , V P ).
For x, x ∈ V N and y ∈ V P , we have For x ∈ V S , x ∈ V N , and y ∈ V P , we have The probability for x ∈ V N is (1−p). The above inequalities thus show that SANTEXT + ensures the properties of UMLDP. Similarly, we use the composability to generalize for L > 1.

A.3 Qualitative Observations
Below, we focus on SANTEXT sanitizing a single token x. We first make two extreme cases explicit.
(1) When = 0, the distribution in Eq. (1) becomes Pr[M(x) = y] = 1 |V| , ∀y ∈ V. SANTEXT is perfectly private since y is uniformly sampled at random, independent of x. Yet, such a y does not preserve any information of x. For a general ∈ (0, ∞), the distribution has full support over V, i.e., we have a non-zero probability for any possible y ∈ V such that M(x) = y. Also, given y, y ∈ V with d(x, y) < d(x, y ), we have Pr[M(x) = y] > Pr[M(x) = y ]. As increases, Pr[M(x) = y] for the y's with large d(x, y) goes smaller (and even approaches 0). This means that the output distribution becomes "skewed," i.e., the outputs concentrate on those y's with small d(x, y). This is good for utility, which stems from the semantics preservation of every token. On the contrary, too much concentration weakens the privacy.
For SANTEXT + , the above results directly apply to the case x ∈ V S (as SANTEXT is run over V S and V P ). There is an extra p determining whether a x ∈ V N is mapped to a y ∈ V P . If so, the results are similar except with an extra multiplicative p. A larger p leads to stronger privacy as the probability (1 − p) of x being unchanged becomes smaller.   Table 4 shows two examples of sanitized texts output by SANTEXT and SANTEXT + at different privacy levels from the SST-2 and QNLI datasets.

C Supplementary Related Works
Privacy is a practically relevant topic that also poses research challenges of diverse flavors. Below, we discuss some "less-directly" relevant works, showcasing some latest advances in AI privacy.
Cryptographic Protection of (Text) Analytics. There has been a flurry of results improving privacy-preserving machine-learning frameworks (e.g., (Lou et al., 2020)), which make use of cryptographic tools such as homomorphic encryption and secure multi-party computation (SMC) for general machine/deep learning. These cryptographic designs can be adapted for many NLP tasks in prin-ciple. Nevertheless, they will slow down computations by orders of magnitude since cryptographic tools, especially fully homomorphic encryption, are generally more heavyweight than the DP approaches. One might be tempted to replace cryptography with ad hoc heuristics. Unfortunately, it is known to be error-prone (e.g., a recently proposed attack (Wong et al., 2020) can recover model parameters during "oblivious" inference).
A recent trend (e.g., (Wagh et al., 2021)) relies on multiple non-colluding servers to perform SMC for secure training. However, SMC needs multiple rounds of communication. It is thus more desirable to have a dedicated connection among the servers.
Albeit with better utility (than DP-based designs), cryptographic approaches mostly consider immunity against membership inference  to be out of their protection scope since DP mechanisms could be applied over the training data before the cryptographic processing.
There is a growing interest in privacy-preserving analytics in the NLP community too. Very recently, TextHide (Huang et al., 2020) devises an "encryption" layer for the hidden representations. Unfortunately, it is shown to be insecure by cryptographers and privacy researchers Carlini et al. (2020a).
Hardware-Aided Approaches. GPU can compute linear operations in a batch much faster than CPU. Nevertheless, we still need a protection mechanism in using GPU, another protection mechanism for the non-linear operations, and their secure integration. In general, utilizing GPU for privacypreserving machine-learning computations is nontrivial (e.g., see  for an extended discussion).
To exploit the parallelism of GPU while minimizing the use of cryptography, one can resort to a trusted processor (e.g., Intel SGX) for performing non-linear operations within its trusted execution environment (TEE) Note that one still needs to use cryptographic protocols to outsource the linear computation to (untrusted) GPU. Slalom (Tramèr and Boneh, 2019) is such a solution that supports privacy-preserving inference. Training is a more challenging task that was left as an open challenge. Recently, it is solved by Goten (Ng et al., 2021). Notably, both works are from cryptographers but also get recognized by the AI community.
Finally, we remark that the use of TEE is not a must in GPU-enabled solutions. For example, GForce (Ng and Chow, 2021) is one of the pioneering works that proposes GPU-friendly protocols for non-linear layers with other contributions.