Are Large Pre-Trained Language Models Leaking Your Personal Information?

Are Large Pre-Trained Language Models Leaking Your Personal Information? In this paper, we analyze whether Pre-Trained Language Models (PLMs) are prone to leaking personal information. Specifically, we query PLMs for email addresses with contexts of the email address or prompts containing the owner's name. We find that PLMs do leak personal information due to memorization. However, since the models are weak at association, the risk of specific personal information being extracted by attackers is low. We hope this work could help the community to better understand the privacy risk of PLMs and bring new insights to make PLMs safe.


Introduction
Pre-trained Language Models (PLMs) (Devlin et al., 2019;Brown et al., 2020;Qiu et al., 2020) have taken a significant leap in a wide range of NLP tasks, attributing to the explosive growth of parameters and training data.However, recent studies also suggest that these large models pose some privacy risks.For instance, an adversary is able to recover training examples containing an individual person's name, email address, and phone number by querying the model (Carlini et al., 2021).This may lead to privacy leakage if the model is trained on a private corpus, in which case we want to improve the performance with the data (Huang et al., 2019).Even if the data is public, PLMs may change the intended use, e.g., for information that we share but do not expect to be disseminated.Carlini et al. (2021Carlini et al. ( , 2022) ) demonstrate that PLMs memorize a lot of training data, so they are prone to leaking privacy.However, if the memorized information cannot be effectively extracted, it is still difficult for the attacker to carry out effective attacks.For instance, Lehman et al. (2021) attempt to recover specific patient names and conditions with which they are associated from a BERT model that is pre-trained over clinical notes.However, they find that with their methods, the model cannot meaningfully associate names with conditions, which suggests that PLMs may not be prone to leaking personal information.
Based on existing research, we are not sure whether PLMs are safe enough in terms of preserving personal privacy.Therefore, we are interested in: Are Large Pre-Trained Language Models Prone to Leaking Personal Information?
To answer the above question, we first identify two capacities that may cause privacy leakage: memorization, i.e., PLMs memorize the personal information, thus the information can be recovered with a specific prefix, e.g., tokens before the information in the training data; and association, i.e., PLMs can associate the personal information with its owner, thus attackers can query the information with the owner's name, e.g., the email address of Tom is . If a model can only memorize but not associate, though the sensitive information may be leaked in some randomly generated text as shown in Carlini et al. (2021), attackers cannot effectively extract specific personal information since it is difficult to find the prefix to extract the information.As far as we know, this paper is the first to make this important distinction.
We focus on studying a specific kind of personal information -email address.Emails are an indispensable medium for personal/business communication.However, there are abiding problems of email fraud and spam, and the source of these problems is the leakage of personal information including email addresses.
From our experiments, we find that PLMs do leak personal information in some situations since they memorize a lot of personal information.However, the risk of a specific person's information being extracted by an interesting attacker is low since PLMs are weak at associating personal information with the information owner.We also find that some conditions, e.g., longer text patterns associated with email addresses, more knowledge about the owner, and larger scale of the model, may increase the attack success rate.Our conclusion is that PLMs like GPT-Neo (Black et al., 2021) are relatively safe in terms of preserving personal information, but we still cannot ignore the potential privacy risks of PLMs.

Related Work
Knowledge Retrieval from Language Models.Previous works have shown that large PLMs contain a significant amount of knowledge, which can be recovered by querying PLMs with appropriate prompts (Petroni et al., 2019;Bouraoui et al., 2020;Jiang et al., 2020a,b;Wang et al., 2020).In this work, we attempt to extract personal information from PLMs, which can be treated as a special kind of knowledge.But unlike previous work that wants PLMs to contains as much knowledge as possible, we prefer the model to include as little personal information as possible to avoid privacy leakage.

Memorization and Privacy Risks of Language
Models.Recent works have demonstrated that PLMs memorize large portions of the training data (Carlini et al., 2021(Carlini et al., , 2022;;Thakkar et al., 2021).This may cause some privacy issues since sensitive information may be memorized in the parameters of PLMs and be leaked in some situations.Pan et al. (2020) find the text embeddings from language models capture sensitive information from the plain text.Lehman et al. (2021); Vakili and Dalianis (2021) study the privacy risk of sharing parameters of BERT pre-trained on clinical notes.To mitigate privacy leakage, there is a growing interest in making PLMs privacy-preserving (Anil et al., 2021;Li et al., 2022;Yu et al., 2021;Shi et al., 2021;Hoory et al., 2021;Brown et al., 2022) by training PLMs with differential privacy guarantees (Dwork et al., 2006;Dwork, 2008) or removing sensitive information from the training corpus.

Problem Statement
Our task is to measure the risk of PLMs in terms of leaking personal information.We identify two capacities of PLMs that may cause privacy leakage: memorization and association, defined as Definition 1 (Memorization) Personal information x is memorized by a model f if there exists a sequence p in the training data for f , that can prompt f to produce x using greedy decoding.2Definition 2 (Association) Personal information x can be associated by a model f if there exists a prompt p (usually containing the information owner's name) designed by the attacker (who does not have access to the training data) that can prompt f to produce x using greedy decoding.
To quantify memorization, an effective approach is to query the model with the context of the target sequence (Carlini et al., 2022).To measure association, we try to impersonate attackers to attack the model by querying with various prompts.
We focus on testing the models on email addresses.An email address consists of two major parts, local part and domain, forming localpart@domain, e.g., abcf@xyz.com.We define attack tasks based on memorization and association: 1) given the context of an email address, examine whether the model can recover the email address; 2) given the owner's name, query PLMs for the associated email address with an appropriate prompt.

Data and Pre-Trained Model
We test on the GPT-Neo model family (Black et al., 2021) (125 million, 1.3 billion, and 2.7 billion parameters), which are causal language models pretrained on the Pile (Gao et al., 2020), a large public corpus that contains text collected from 22 diverse high-quality datasets, including the Enron Corpus.
The Enron Corpus3 (Klimt and Yang, 2004) is a dataset containing over 600,000 emails generated by employees of the Enron Corporation.We process the corpus to collect (name, email) pairs.Following Gao et al. (2020), we firstly parse all the email contents to get the body parts.In these email bodies, all the email addresses are extracted.Then referring to the UC Berkeley Enron Database4 , we map the email addresses to their owners' names to get (name, email) pairs.The Enron Company email addresses have an obvious pattern of first_name.last_name@enron.com.Language models can easily follow this pattern to predict an email address given the owner's name, which makes the analysis meaningless.Therefore, in the experiments, we only focus on the non-Enron domain addresses.To build the few-shot settings (explained in section 5), we filtered out email addresses whose domain appears less than 3 times in the corpus.We also filtered out pairs whose name has more than 3 tokens, in which case can be considered invalid.After all the pre-processing, there are 3238 (name, email) pairs collected for the following experiments.

Method
We design different prompts and feed them into GPT-Neo.We generate 100 tokens and use regular expression matching to find the email addresses.The first email address appearing in the output texts is extracted as the predicted email address.There are cases where no email address appears in the output texts.We use greedy decoding in the decoding process of generation by default and report results of other decoding algorithms in Appendix B. Assuming ({name0}, {email0}) is the target pair, the experiments are designed as follows.

Context Setting
Carlini et al. ( 2022) quantify memorization by examining whether PLMs can recover the rest of a sequence given the prefix of the sequence.We adopt a similar approach to measuring memorization of personal information.Specifically, we use the 50, 100, or 200 tokens preceding the target email address in the training corpus as the input of PLMs to elicit the target email address.

Zero-Shot Setting
We mainly measure association in the zero-shot setting.We create two prompts manually to extract the target email address (A and B).We notice that many email addresses appear in a form like "--Original Message--\nFrom: {name0} [mailto: {email0}]".5This motivates us to create prompts C and D. The prompts are • 0-shot (A): "the email address of {name0} is " • 0-shot (B): "name: {name0}, email: {name0} [mailto: " We may actually know the domain of the target email address for cases like we know which company the target person is working for.For this case, we design a zero-shot prompt as follows: • 0-shot (w/ domain): "the email address of <|endoftext|> is <|endoftext|>@{domain0}; the email address of {name0} is " where <|endoftext|> is the unknown token.

Few-Shot Setting
If an attacker has more knowledge, he/she may be able to make more effective attacks.According to Brown et al. (2020), we can improve the model performance by providing demonstrations, which can be considered as a kind of knowledge of the attacker.We give k true (name, email) pairs as demonstrations for the model to predict the target email address.The prompt is designed as: • k-shot: "the email address of {name1} is {email1}; . . .; the email address of {namek} is {emailk}; the email address of {name0} is " For the demonstrations given in the prompt, we consider two cases: whether the target domain is unknown or known, depending on whether the provided examples are random or in the same domain as the target email address.

Result & Analysis
Tables 1-3 show the results of all the above experiments with three different sized GPT-Neo models.# predicted denotes the number of predictions with email addresses appearing in the generated text.# correct shows the number of email addresses predicted correctly.(# no pattern) means, out of the correct predicted ones, the number of email addresses that do not conform to standard patterns in Table 4.For the known-domain setting, we also report # correct*, which is the number of predicted email addresses whose local part is correct.We include the results of a rule-based method described in Appendix A. We also analyze the effect of frequency of email addresses in Appendix C.

PLMs have good memorization, but poor association
Table 1 shows the results of the context setting.
For the best result, GPT-Neo succeeds in predicting as much as 8.80% of email addresses correctly, including addresses that did not conform to standard patterns.However, from Table 2, we observe that PLMs can only predict a very small number of email addresses correctly, and most of them are with a pattern identified in Table 4.
The results demonstrate that PLMs truly memorize a large number of email addresses; however, they do not understand the exact associations between names and email addresses.It is notable that 0-shot (D) outperforms the other zero-shot prompts significantly; however, the only difference between (C) and (D) is that (D) has a longer prefix.This also indicates that PLMs are making these predictions mainly based on the memorization of the sequences -if they are doing predictions based on association, (C) and (D) should perform similarly.The reason why 0-shot (D) outperforms 0-shot (C) is that the longer context can discover more memorization, as observed in Carlini et al. (2022).
To further validate the above conclusion, we perform a comparative experiment: we extract the same number of email addresses from the Enron Database to create a test set, where the email addresses do not appear in the training corpus.We find that the attack success rate on this dataset decreases a lot, e.g., the accuracy of 0-shot (D)-[2.7B] is 0.19%, compared to 1.24% in Table 2.The results mean that when the domain is unknown, many email addresses recovered by the models are due to memorization/association; otherwise, the performance on these two datasets should be similar.

6.2
The more knowledge, the more likely the attack will be successful From Tables 2 and 3, we notice that there is a huge performance improvement when domain is known or more examples are provided.This is expected as more examples make the model reinforce its learning of email address format/pattern and therefore achieve higher accuracy.

The larger the model, the higher the risk
For all the settings, there is usually an improvement in the accuracy when scaling the model.This phenomenon can be interpreted from two aspects: 1) with more parameters, PLMs are able to memorize more training data.This is reflected mainly in Table 1, and also observed in Carlini et al. (2022).
2) larger models are more sophisticated and able to better understand the crafted prompts, and therefore to make more accurate predictions.

PLMs are vulnerable yet relatively safe
When domain is unknown (Table 2), very few email addresses are predicted correctly, mostly conforming to the standard patterns in Table 4.An exception is 0-shot (D), the models do predict something meaningful, e.g., abcd efg → efg3@xyz.com,though the accuracy is still very low.When domain is known (Table 3), although PLMs can predict many email addresses correctly, the performance is not better than the simple rulebased method.In addition, most correctly predicted email addresses conform to standard patterns.This is not particularly meaningful since attackers can also simply guess them from the pattern.
For the context setting (Table 1), PLMs can make more meaningful predictions.However, in practice, if the training data is private, attackers have no access to acquire the contexts; if the training data is public, PLMs cannot improve the accessibility of the target email address since attackers still need to find (e.g., via search) the context of the target email address from the corpus first in order to use it for prediction.However, if the attacker already finds the context, he/she can simply get the email address after the context without the help of PLMs.

We still cannot ignore the privacy risks of PLMs
• Long text patterns bring risks.From the results of 0-shot (D), if the training corpus contains long text patterns that are helpful for attackers to extract personal information, the models may predict specific personal information meaningfully.• Attackers may use existing knowledge to acquire more information.As shown in §6.2, PLMs can leverage different kinds of knowledge to make more meaningful predictions; thus, attackers may be able to use existing knowledge to gain more information about owners from PLMs.• Larger and stronger models may be able to extract much more personal information.As discussed in §6.3, the larger the model, the more personal information can be recovered.We cannot guarantee that the success rate of the attack is still within an acceptable range as we continue to scale up language models.• Personal information may be accidentally leaked through memorization.From the results of the context setting, we find that 8.80% of email addresses can be recovered correctly with the largest GPT-Neo model through memorization.
This means that the email addresses may still be accidentally generated, and the threat cannot be ignored as discussed by Carlini et al. (2021).

Mitigating Privacy Leakage
Now that we have seen some potential risks of PLMs in terms of personal information leakage.
Here we discuss several possible strategies to mitigate these threats.
For training PLMs, we can mitigate privacy risks before, during, and after model training: • Pre-processing.1) Identify and clear out or blur long patterns that could pose potential risks, e.g., the pattern of 0-shot (D); 2) deduplicate training data.According to Lee et al. (2022), deduplication can substantially reduce memorized text; therefore, less personal information will be memorized by PLMs.• Training.As suggested in Carlini et al. (2021) and implemented in Anil et al. (2021), we can train the model with differentially private stochastic gradient descent (DP-SGD) algorithm (Abadi et al., 2016) for DP guarantees (Dwork et al., 2006;Dwork, 2008).• Post-processing.For API-access models like GPT-3, include a module to examine whether the output text contains sensitive information.If so, refuse to answer or mask the information.
For information owners, taking email addresses as an example, we suggest as follows: • Do not disclose text form of personal information directly on the Web.For instance, use a picture instead or rewrite the email address and provide instructions for recovering the email address.• Avoid using email addresses with obvious patterns, since attacks on email addresses with a pattern have a much higher success rate than those without a pattern.

Conclusion
Our paper presents the first distinction between memorization and association in pre-trained language models.The results show that PLMs do leak personal information through memorization; however, the risk of specific personal information being leaked by PLMs is low since they cannot associate personal information with the owner meaningfully.
We suggest several defense techniques to mitigate potential threats and hope this study can give new insights to help the community understand the risk of PLMs and make PLMs more trustworthy.

Limitations
In this paper, we measure the risk of personal information being leaked by PLMs.Since this paper involves personal information, we must be very careful in dealing with the data to avoid privacy leakage, which brings some limitations to our research, e.g., the data we can use.We choose email addresses for several reasons: 1) email addresses are representative personal information since emails have penetrated into our lives and are an indispensable medium for personal/business communication; 2) email addresses have a relatively fixed format that can be easily extracted from the corpus (e.g., via regular expression matching) and analyzed (e.g., calculating the accuracy); 3) The Enron Email Dataset is a reasonable source that can be used for our research without introducing any additional privacy cost.Collecting other personal information such as phone numbers and home addresses may raise unnecessary privacy risks, and the collected data is difficult to be made public.Besides, this additionally requires the consent of the information owner under privacy laws and increases the cost of time and money 6 .
We believe the methods and findings in this paper can be generalized to other personal information and private data since the models are trained in a similar way.Importantly, our study can help researchers distinguish the privacy risk caused by memorization and association.For practical usage, we recommend that researchers use our methods to evaluate the privacy risks of their trained models (possibly with their private data) before releasing the models to others.

Ethics Statement
This work has ethical implications relevant to personal privacy.The Privacy Act of 1974 (5 U.S.C. 552a) protects personal information by preventing unauthorized disclosures of such information.As we discussed in §1, the leakage of personal information like email addresses (whether or not it has been made public) will cause privacy issues such as email fraud and spam.This is also a reason why the study in this paper is important.
To minimize ethical concerns and make the results reproducible, we perform analysis on data and models that are already public.We also replace the real email address with consecutive char-6 According to Wikipedia, the price of Enron Corpus is $10,000.acters such as abcd in the writing to protect privacy.We believe that the benefits of this paper far outweigh the potential harms.Although the results indicate that specific personal information being leaked by PLMs is low since PLMs are weak at association, we cannot underestimate the threats brought by memorization and ignore the potential risks of association.We still suggest researchers take the privacy risks of PLMs seriously and adopt the strategies as suggested in §7 to mitigate privacy leakage.Many email addresses follow patterns of the combination of the owners' first name, last name, and initials (from our analysis, more than half of email addresses in the dataset have significant patterns).For example, if the owner's name is abcd, with domain known as xyz.com, its email address is likely to be abcd@xyz.com7; if the owner's name is abcd efg, with domain known as xyz.com, its email might be abcd.efg@xyz.com, aefg@xyz.com, abcd@xyz.com, etc.Based on this observation, for the settings where the target domain is known, we design a rule-based method as a baseline.We identify 28 patterns classified by the length of the owner's name in Table 4.And we use Z to denote email addresses that cannot be categorized into these 28 patterns.
In the zero-shot setting, we simply use pattern A1, B6, and C9 to recover the target email address, e.g., abcd efg → aefg@xyz.com.For the k-shot setting, the algorithm first identifies the patterns in the demonstrations, and uses the most frequent pattern to predict the local part, concatenated with the provided domain.For example, assuming that we want to predict the email address of a person with a name of length 2, the patterns of the 5 sampled demonstrations are {B3, B5, C2, B5, Z}.Among the patterns, the compatible ones are {B3, B5, B5}, We observe that the mean and median for those correctly predicted email addresses are higher than all the email addresses in the dataset (all), which indicates that more frequent email addresses are more likely to be memorized and associated by PLMs.Similar findings that repeated strings are memorized more were observed in Carlini et al. (2021Carlini et al. ( , 2022)); Lee et al. (2022).

Table 1 :
Results of prediction with context.Context (100) means that the prefix contains 100 tokens.

Table 2 :
Results of settings when domain is unknown.

Table 3 :
Results of settings when domain is known.

Table 4 :
The list of email address patterns.