Domain Private Transformers for Multi-Domain Dialog Systems

Large, general purpose language models have demonstrated impressive performance across many different conversational domains. While multi-domain language models achieve low overall perplexity, their outputs are not guaranteed to stay within the domain of a given input prompt. This paper proposes domain privacy as a novel way to quantify how likely a conditional language model will leak across domains. We also develop policy functions based on token-level domain classification, and propose an efficient fine-tuning method to improve the trained model's domain privacy. Experiments on membership inference attacks show that our proposed method has comparable resiliency to methods adapted from recent literature on differentially private language models.


Introduction
Large language models have enabled significant progress in machine learning and NLP across a wide range of tasks and domains (Bommasani et al., 2021).They perform especially well in settings where little training data is available for new domains of interest.A popular approach in such settings is transfer learning: fine-tune a pretrained model on data from specialized domains (Howard and Ruder, 2018;Zhang et al., 2021;Yang et al., 2021;Budzianowski and Vulić, 2019;Hosseini-Asl et al., 2020).Here, performance is typically measured in perplexity (or a task-specific metric) for each new domain while controlling for model complexity or data (Gururangan et al., 2022).
We introduce a novel definition of privacy for contextual language models to enforce that text prompts from one domain do not leak sensitive text of other domains.Practitioners train generative models on datasets often curated from diverse domains, e.g.news article categories or dialog tasks.Model users are often then interested in safe * Work done at ASAPP Inc generation: models when prompted with text from one domain must not generate sensitive text from other domains.Safe generation is a key requirement for model providers who pool datasets from many contracted companies-each company might require the model to not generate their sensitive text when prompted with text from other companies.We call such safe generation domain privacy.Let {d 1 , . . ., d N } be domains from which datasets are created.Let M D be a model trained on text dataset D. To verify if M D is domain private for domain d i , we can prompt the model with contexts {c i } from domain d i , and check if the generations contain sensitive text of domains d j for j ̸ = i.
Our contributions are: we 1) define domain privacy as a new property for contextual language models, 2) propose fine-tuning algorithms that trade off domain privacy for model performance, and 3) conduct extensive experiments for text generation with multi-domain datasets.Domain privacy scales well with the number of domains, and allows for flexible definitions of domain-sensitive text.Our proposed fine-tuning algorithms utilize differentially-private training to attain domain privacy, while achieving good performance.

Related Work
Domain Adaptation Large pretrained language models have been shown to achieve good performance when fine-tuned on small datasets from new domains (Gururangan et al., 2020).To improve efficiency, recent multi-domain approaches leverage multitask learning (Lin et al., 2020), model distillation (Yao et al., 2021), and/or meta-learning (Pan et al., 2021).Hu et al. (2019) propose private metalearning for discriminative tasks; our work is the first for private multi-domain text generation.
Differentially Private Language Models Differential privacy is a powerful framework that provides rigorous guarantees on training data expo-sure to adversaries (Dwork and Roth, 2014).Recent work (Yu et al., 2022;Li et al., 2022) describes differentially-private fine-tuning for large language models like GPT-2 (Radford et al., 2019), albeit on data from a single domain.However, standard notions of differential privacy, including those for single-domain language models (Ginart et al., 2021;Shi et al., 2022b,a), are insufficient for multi-domain language modeling.Firstly, they are too restrictive as privacy guarantees must hold uniformly for all test inputs, regardless of how often they appear in the current domain.Secondly, they assume dataset perturbations at the sample-level (Dwork and Roth, 2014) or individual-level (Jain et al., 2021) data, rather than at the domain-level.

Preliminaries
We recall a few definitions first.See Appendix A for further details.

Language modeling
Given a text sequence of tokens τ i = (t 1 , . . ., t i ), an autoregressive language model estimates next token probability The model is trained by minimizing cross-entropy between next ground-truth token and model's predictions.Finally, we can use the model to compute the perplexity (PPL) of a sequence τ n .
Privacy An algorithm is differentially private if it is not too sensitive to the differentiating element in two neighboring inputs.In language modeling where the element is a text sequence, users often want to only control sensitivity on sequence's private tokens, e.g.phone numbers and proper nouns.Shi et al. (2022b) thus define Selective Differential Privacy using policy functions.
A policy function F annotates a sequence τ n with 0-1 labels; F (τ n ) i = 1 if the i th token is private and 0 if public.F then defines neighboring text sequence datasets: D ′ is called an F -neighbor of D, i.e.D ′ ∈ N F (D), if they differ in exactly one text sequence on which F does not agree.
Membership Inference Attacks Differential privacy gives theoretical guarantees which may not be applicable in practice (Dwork et al., 2019).Empirically, we can verify models' privacy using membership inference attacks that check for training data leakage (Shokri et al., 2017).For generative models, these attacks check if models generate training text when prompted (Carlini et al., 2021).
We can measure the attacks' success rate and empirically compare the privacy of generative models.Likelihood Ratio (LiRa) membership inference attacks compare target models relative to a reference model (Carlini et al., 2022(Carlini et al., , 2021)).LiRa attacks work as follows: (i) prompt a target model with contexts {c i } to generate text {x i }, (ii) rank {x i } by generation likelihood PPL target (x i |c i )/PPL ref (x i |c i ), and (iii) select x i with the highest ratios.If these x i contain sensitive text then the target model is said to leak and the attack is deemed successful.Finally, we can compare target models by their LiRa attack success rate = #success / non-empty-generations.

Domain Privacy
Consider two domains d i and d j where i ̸ = j.The goal of domain privacy is to check how likely a model is to generate sensitive text from domain d j when prompted with text from domain d i .To check if text contains private tokens of domain d j , we can use a policy function F j .Since domains d i and d j could have inherent overlap, e.g.politics and sports news overlapping due to geopolitics, we will use M D j as a reference model where D j = D \ d j is the dataset obtained by removing text of domain d j from D. The likelihood of M D j leaking sensitive text of d j serves as an upper bound for the target model leakage.Here D and D j are neighbors at domain level w.r.t.F j as they differ in one domain.
Domain privacy captures the need for safe generation: inter-domain private generation and intradomain public generation.It extends Selective Differential Privacy in three ways.Firstly, it requires models to be private at domain-level rather than token-level.Secondly, it allows models to generate sensitive text of d i when prompted with {c i }-only leaking text of other domains is restricted.Finally, domain privacy uses LiRa membership inference attacks; Selective Differential Privacy lacks this.Hence, domain privacy can be empirically tested.

Methodology
Next we study domain privacy applied to the problem of generating dialog text.

Policy Functions
A policy function flags text considered sensitive for a domain, enabling us to check for domain privacy.We use policies in two ways: (i) to create redacted datasets for fine-tuning target models (replacing sensitive tokens with <REDACTED> tokens), and (ii) to check if generations leak sensitive text during LiRa attacks.We describe data-driven policies below; one could also use rule-based ontologies.
The Keyword Detection policy checks if any tokens in text τ are in a set of hand-picked keyword tokens K i sensitive to domain d i .Formally, F keyword i (τ ) = 1 if there exists token t ∈ τ with t ∈ K i .This is compatible with defining domains based on n-gram overlap (Gururangan et al., 2022).The Sequence Classification policy uses a contextual RoBERTa model h BERT (Liu et al., 2019) fine-tuned to predict the domain from (a sequence of) tokens.We use a specified threshold z to set

Target Models
There has been much work to protect against membership inference attacks.We describe several target models that we test for domain privacy (in parenthesis we define model aliases for future use).
Let D be the dataset and d i be the domain being tested.We adapt this two-stage process into a one-stage one: initially fine-tune on redacted data and gradually transition to non-redacted data (Redaction Schedule).A redaction schedule determines this transition according to a parameter p that decreases from 1 to 0 during fine-tuning.At every step during fine-tuning, with probability p we fine-tune with AdamW on redacted data, and with probability 1−p we fine-tune with DP-AdamW on non-redacted data.This one-stage process has half the training cost of JFT, but still many of its benefits.

Datasets
We use the MultiDoGo dataset (Peskov et al., 2019), which consists of task-oriented dialogs of useragent customer service simulation from 6 domains.We use the 3 largest domains: AIRLINE (air travel bookings), MEDIA (telecommunication and cable), and INSURANCE (policy modifications).We preprocess the dataset by adding control tokens to each dialog, such as speaker tokens, start-of-conversation <_soc_>, an end-of-conversation <_eoc_>, and domain-name (e.g.<AIRLINE>).Appendix C includes further preprocessing details and examples from redacted and non-redacted dataset versions.1

Training target models
We create 60-20-20 train-validation-test splits for each domain, and coalesce similar splits.We tune hyperparameters like learning rate using the validation perplexity.The threshold z for RoBERTa policy is set by maximizing the difference of LiRa success rate between DOMAIN i Only and Public models (recall the rate = #success / non-emptygenerations).To get the target models in Section 5.2, we fine-tune a pretrained GPT-2 checkpoint on data from all 3 domains.For the proposed Redaction Schedule fine-tuning procedure, we use the "expconcave" schedule (see Appendix D).

LiRa Attacks for MultiDoGo dataset
We conduct LiRa attacks on each target model to test for domain privacy-we check if a model leaks sensitive text of domain d j when prompted with contexts from domain d i , i ̸ = j.Here we focus on i = AIRLINE.Into each model, we feed 100 prompts from the AIRLINE domain and generate 10 different outputs for each prompt.We use the control tokens as generation-stopping-criteria, and suppress generating <REDACTED> tokens.See Appendix E for results on other domains, LiRa attack examples, and example model generations.We compare target models on LiRa success rate and test perplexity metrics.Figure 1 shows these two metrics for each target model.LiRa attacks are more successful w.r.t. the RoBERTa redaction policy compared to the keyword, because the former has higher recall and lower precision.Focusing on RoBERTa policy, all models but Private and Public fine-tuning have LiRa success rate lower than the AIRLINE Only baseline.While having comparable domain privacy, JFT has better perplexity and Redaction Schedule has worse perplexity when compared to Pub+Redacted.Domain leakage is generally more sensitive to learning rate for JFT, while perplexity is more sensitive to learning rate for Redaction Schedule.We also test running each stage of JFT for half the number of steps, i.e. with total compute comparable to other models.Figure 2 shows Renyi DP guarantee2 vs. test perplexity for each model.Public has no privacy guarantee (ϵ = ∞), and (Pub+Redacted) has an ideal guarantee of ϵ = 0 as it is only finetuned on redacted data.We further see that for both keyword and RoBERTa redaction policies, Redaction Schedule models have privacy guarantees ≈ 35% better than JFT.We observe that vanilla fine-tuning like Public is insufficient for domain privacy.Domain privacy becomes feasible with fine-tuning algorithms designed for Selective Differential Privacy; these algorithms fine-tune partially on redacted datasets built with policies.

Conclusions
This paper compares multi-domain language models for dialog data on a new concept of domain privacy.We propose two policies for redacting domain-sensitive tokens, enabling recent differentially-private training algorithms to be used for preserving domain privacy.Future research directions include studying the domain privacy properties of additional training strategies, and understanding the interplay between domain privacy and performance on downstream tasks.

Limitations
Sequence classification policies are more susceptible to data bias and systemic uncertainty than rule-based policies that are based on keywords or parts of speech.While our policy functions are more general than previous work, they can only approximate human subjectivity implicit in marking tokens as domain-sensitive.Additionally, it is not clear how our definition of domain privacy is amenable to theoretical properties that differential privacy provides, such as composability and group privacy.LiRa attacks are one natural tool to check inter-domain leakage in contextual language models; other tools can be developed to either certify domain privacy guarantees or check for domain privacy violations.9 Ethics/Impact Models that are not domain private pose a security risk in deployment due to inter-domain leakage.We show that the predominant transfer learning approach, which fine-tunes a single pretrained model on data from several new domains, is risky from a leakage standpoint.We show how membership inference attacks can target models to leak training data, and note that these attacks can be extended to real-world models trained on proprietary data.The data collection agreement used in one domain could forbid the use of data for any other purpose, e.g. generation for any other domain.While this was not an ethical concern for the data used in this paper, it remains an open area of discussion for the ML community.

A Further Definitions
Language Modeling The perplexity (PPL) of a text sequence τ n (w.r.t. an autoregressive language model) is defined as: Privacy Let A : X → Y be a randomized algorithm.Two input sets X, X ′ ⊆ X are neighbors if they differ in exactly one element.Dwork and Roth (2014) define Differential Privacy for A as follows.
To hone in on the private tokens of a text sequence, Shi et al. (2022b) introduce Selective Differential Privacy, which uses policy functions to define neighboring datasets.Definition A.2 (Policy Function) A policy function F : T → {0, 1} n annotates tokens in a sequence τ n ∈ T as private or not.F (τ n ) i = 1 if the i th token is private and 0 if public.
Thus, two text sequence datasets D, D ′ are Fneighbors if they differ in only one text sequence on which F 's annotations do not match.Mironov (2017) show interchangeability between Renyi differential privacy and differential privacy, i.e. an algorithm satisfying (α, ϵ)-Renyi differential privacy satisfies (ϵ δ , δ)-differential privacy for any δ ∈ (0, 1), and vice versa.Renyi differential privacy is defined as follows.

B Computation
All language models were fine-tuned from a public GPT-2 small checkpoint with 124M parameters (Radford et al., 2019).Model training was done on a server with one A10G Tensor Core GPU and 24 GB GPU memory, which took approximately 3 hours per model.

C Data Preprocessing and Experimental setup
As mentioned earlier, we use dialogs from AIRLINE, MEDIA, and INSURANCE domains from the MultiDoGo dataset.These domains have ≈15k, ≈33k, and ≈14k dialogs respectively.We preprocess dialog samples as follows.Consider a sample "SYS: Hello, you are connected to LMT Airways!How may I help you?USR: Change my seat assignment SYS: . . .".We preprocess this dialog sample by adding start-of-conversation control token <_soc_>, end-of-conversation control token <_eoc_>, and domain-name control token <AIRLINE> before every utterance.A dialog then looks like "<_soc_> <AIRLINE> SYS: Hello, you are connected to LMT Airways!How may I help you?<AIRLINE> USR: Change my seat assignment <AIRLINE> SYS: . . .<_eoc_>".For a dialog sample from MEDIA domain, we similarly add <MEDIA> control tokens.
We also create another set of datasets where we do not add the control domain tokens, and follow the same fine-tuning and LiRa attack procedure on these datasets.See Section E.3 for results on this ablation experiment.
Finally, we concatenate all dialogs for a domain and chunk them by 1024 tokens, the maximum sequence length used during GPT-2 pretraining.

C.1 Redaction Policies
Table 1 shows example dialog turns for each dialog domain and redaction policy.

D Redaction Schedules
We experimented with the redaction schedules described in Figure 3.The two-stage process JFT fine-tunes on redacted data with AdamW optimizer, and then switches to non-redacted data with DP-AdamW optimizer (Shi et al., 2022a).This corresponds to trivial step schedule: constant p = 1 for a half of the training steps and then constant p = 0 for the remaining half.The linear redaction schedule is one approach that transitions smoothly between redacted and nonredacted data.The expconvex schedule decays exponentially fast, and is a convex function-it transitions to non-redacted data after just a few training steps.We found that expconcave schedule outperformed the other schedules as it decayed exponentially slowly, causing the trainer to use redacted data for most of the initial training steps.This is in line with Shi et al. (2022a)'s observation that fine-tuning on the new domain with a non-noisy optimizer like AdamW results in benign initialization.Our expconcave redaction schedule implements this idea in a one-stage fine-tuning process.

E.1 LiRa Attack Outputs
Tables 2 and 3 show the results of LiRa membership inference attacks on the models in Section 6.4.The preferred date of travel is 5:00 AM and the flights under 300 should be a couple of passengers, while that will reach next month and the booking has to be done within the procedure after $170, if you've changed an seats?

E.2 Additional Domains
Figures 4 through 7 show the results of domain leakage experiments when using prompts from the MEDIA and INSURANCE domains.

E.3 Use of Domain Tokens
Figures 8 and 9 show domain privacy tradeoffs when domain control tokens (<AIRLINE> etc.) are removed from all datasets and policy functions.As a general trend we get similar, if not slightly higher, LiRa As a baseline target, we use a model finetuned only on text from D ∩ d i (DOMAIN i Only).All non-baseline target models are fine-tuned on either a redacted version of D or the non-redacted version.The first non-baseline target model is fine-tuned on non-redacted data with AdamW optimizer (Loshchilov and Hutter, 2019) (Public).The second is fine-tuned on redacted data instead (Pub+Redacted).Li et al. (2022) recently proposed optimizing transformers on non-redacted data with DP-AdamW (Private), a differentiallyprivate variant of AdamW.Shi et al. (2022a) optimize for Selective Differential Privacy with a "Just Fine-tune Twice" (JFT) procedure: fine-tune a model with AdamW on redacted data and use the weights to initialize a model, which is then finetuned with DP-AdamW on non-redacted data.Shi et al. (2022a) show that the model adapts to the linguistic style without generating sensitive tokens.
Figure 2: Renyi DP ϵ vs test perplexity.Lower is better for both axes.2x indicates double training cost.

Figure 3 :
Figure 3: Redaction schedules specify the probability of choosing to fine-tune on redacted data at every training step.If the total number of training epochs is T = 10, (i) the linear schedule decays as 1−t/T , (ii) the expconvex schedule decays as exp(−η • t), and (iii) the expconcave schedule decays as 1 − exp( η T • (t − T )) where η is a temperature parameter.
XuechenLi, Florian Tramer, Percy Liang, and Tatsunori  Hashimoto.2022.Large language models can be strong differentially private learners.In International Conference on Learning Representations.

Table 1 :
my name is raja <AIRLINE> SYS: Could you also help me out with your booking confirmation number?<AIRLINE> USR: confirmation number lkj459 <AIRLINE> SYS: Raja I'd like to inform you that you've been allotted 9C which is a window seat, is that fine with you? <AIRLINE> USR: ok Redacted Keyword <AIRLINE> USR: my name is raja <AIRLINE> SYS: Could you also help me out with your <REDACTED> <REDACTED> <REDACTED> <AIRLINE> USR: <REDACTED> <REDACTED> lkj459 <AIRLINE> SYS: Raja I'd like to inform you that you've been allotted 9C which is a <REDACTED> <REDACTED> is that fine with you? <AIRLINE> USR: ok Redacted RoBERTa <AIRLINE> USR: my name is raja <AIRLINE> SYS: Could you also help me out with your booking confirmation number?<AIRLINE> USR: confirmation number lkj459 <AIRLINE> SYS: <REDACTED> <REDACTED> <REDACTED> <REDACTED> <REDACTED> <REDACTED> <REDACTED> <REDACTED> <REDACTED> <REDACTED> <REDACTED> <REDACTED> <REDACTED> <REDACTED> <REDACTED> <REDACTED> is that fine with you? <AIRLINE> USR: ok Example 2 MEDIA Domain Original <_soc_> <MEDIA> USR: Hi Cameron, <MEDIA> SYS: Hi, good morning!You've reached the customer executive of Fastnet Cable services, how may I help you today?<MEDIA> USR: I want to sign up for new internet service with 5 GB plan <MEDIA> SYS: Sure! I'll be glad to help you get new cable connection, may I please know your city and its zip code?Redacted Keyword <_soc_> <MEDIA> USR: Hi Cameron, <MEDIA> SYS: Hi, good morning!You've reached the customer executive of <REDACTED> <REDACTED> <REDACTED> how may I help you today?<MEDIA> USR: I want to sign up for <REDACTED> <REDACTED> <REDACTED> with 5 GB plan <MEDIA> SYS: Sure! I'll be glad to help you get <REDACTED> <REDACTED> <REDACTED> may I please know your city and its <REDACTED> <REDACTED> Example output redactions from the MultiDoGo dataset.The keyword redaction policy has higher precision and lower recall than the RoBERTa-based redaction policy.

Table 2 :
High confidence examples produced by applying the LiRa attack on the Public and Private models in Section 6.4.Policy functions generally match with human judgements of leakage into MEDIA and INSURANCE domains.Hey there, Good morning!You'v reached Jet fly customer service!You're connected to Sam! I'd be glad to assist you today!<AIRLINE> USR: i need boarding pass <AIRLINE> SYS:Hey there!Good morning.You've reached LMT Airways.Your flight from chennai to 12:30 pm is a pleasure!Hey! Could you please help me out with the departure city?<AIRLINE> USR: Sure thing, I'll be leaving from Jacksonville <AIRLINE> SYS: Oh great!May I know your preferred date of travel?<AIRLINE> USR: So I'm actually pretty flexible on when we leave.So some time in September, and if you're able to find a flight for under $300, that would be fantastic <AIRLINE> SYS:

Table 3 :
High confidence examples produced by applying the LiRa attack on the JFT and Redaction Schedule models in Section 6.4.Policy functions generally match with human judgements of leakage into MEDIA and INSURANCE domains.