PromptEHR: Conditional Electronic Healthcare Records Generation with Prompt Learning

Accessing longitudinal multimodal Electronic Healthcare Records (EHRs) is challenging due to privacy concerns, which hinders the use of ML for healthcare applications. Synthetic EHRs generation bypasses the need to share sensitive real patient records. However, existing methods generate single-modal EHRs by unconditional generation or by longitudinal inference, which falls short of low flexibility and makes unrealistic EHRs. In this work, we propose to formulate EHRs generation as a text-to-text translation task by language models (LMs), which suffices to highly flexible event imputation during generation. We also design prompt learning to control the generation conditioned by numerical and categorical demographic features. We evaluate synthetic EHRs quality by two perplexity measures accounting for their longitudinal pattern (longitudinal imputation perplexity, lpl) and the connections cross modalities (cross-modality imputation perplexity, mpl). Moreover, we utilize two adversaries: membership and attribute inference attacks for privacy-preserving evaluation. Experiments on MIMIC-III data demonstrate the superiority of our methods on realistic EHRs generation (53.1% decrease of lpl and 45.3% decrease of mpl on average compared to the best baselines) with low privacy risks. Software is available at https://github.com/RyanWangZf/PromptEHR.


Introduction
The prevalence of electronic patient healthcare records fuel the development of machine learning models for many healthcare applications (Choi et al., 2016b,a;Wang et al., 2021a,b;Wang and Sun, 2022a).However, sharing EHR data usually undergoes strict and expensive de-identification and administration processes thus being difficult.Although there have been attempts on perturbing potentially identifiable attributes as the deidentification step (Emam et al., 2015), they were argued not immune to the hack for re-identification (El Emam et al., 2011;Choi et al., 2017).Alternatively, generating synthetic but realistic EHRs can circumvent data leakage while preserving the patterns of real EHRs for further research and development (Biswal et al., 2020).Deep generative models like GANs (Goodfellow et al., 2014) and VAEs (Kingma and Welling, 2013) have become popular for unconditional EHRs generation (Choi et al., 2017) and longitudinal EHRs generation (Biswal et al., 2020;Zhang et al., 2020) for diagnosis codes.However, EHRs are often multimodal with different types of events, including diagnoses, procedures, medications, and also patient baseline demographic features like age and gender (Johnson et al., 2016).GANs & VAEs usually struggle to model complex multimodal and non-Gaussian distributions as well as sparse onehot-encoded vectors (Xu et al., 2019).By contrast, generative language models (LMs) are proved highly powerful to represent large and complex distributions on discrete data (e.g., texts) (Liu et al., 2021b;Radford et al., 2021), which makes them promising for EHRs generation.
In this work, we propose to leverage generative language models (LMs) for EHRs generation.We try to generate a sequence of visits with mixed types of events, e.g., diagnosis and medications.As Fig. 1 shows, previous works make unconditional generation for single-modal static EHRs (Choi et al., 2017) or for single-modal longitudinal EHRs (Zhang et al., 2021).However, real EHRs are heterogeneous with multiple types of temporal events and have baseline patient features, e.g., demographic information.We seek to (1) generate realistic mixed-type longitudinal EHRs with scale and (2) support flexible conditional generation to fit the need for personalized EHRs.Specifically, our contributions are • We propose a new EHRs generation method making the best of LMs, which enables generating multimodal EHRs.
• We design prompt learning for controllable and flexible EHRs generation with LMs.
• We design comprehensive evaluation for both quality and privacy of the generated EHRs.
2 Related Works

EHRs Generation
Early works on generating EHRs (Lombardo and Moniz, 2008;Buczak et al., 2010;McLachlan et al., 2016) are rule-based methods.However, they were argued not capable of providing realistic data for machine learning tasks and were still vulnerable to re-identification (Choi et al., 2017).Deep generative models advanced by the power of deep learning, e.g., variational auto-encoders (VAE) (Kingma and Welling, 2013) and generative adversarial network (GAN) (Goodfellow et al., 2014), gained most attention recently.Choi et al. (2017) pioneered in adapting GAN for discrete patient records generation, namely MedGAN, which was followed by improving GANs for EHRs generation (Guan et al., 2018;Baowaly et al., 2019;Zhang et al., 2020); using VAE (Biswal et al., 2020), hybrid GANs (Lee et al., 2020;Cui et al., 2020), or conditional GANs (Xu et al., 2019).However, most methods only generate static tabular EHRs or longitudinal single-modal EHRs.GANs are often riddled with mode collapse, non-convergence, and instability, which cause their training tricky in practice (Saxena and Cao, 2021).Moreover, due to the representation limit, GANs struggle in modeling multimodal distributions and sparse one-hotencoded vectors (Xu et al., 2019) while EHRs are with these properties.By contrast, we bypass these challenges by LMs.A comprehensive review of EHR synthesis is provided by Wang et al. (2022).

Language Models & Prompt Learning
LMs are often used for text generation tasks attributed to their auto-regressive nature, e.g., T5 (Raffel et al., 2020) and BART (Lewis et al., 2020).Nonetheless, they cannot be directly applied to EHRs generation since EHRs consist of not only plain clinical notes but also longitudinal sequences of events.Although there were works on encoding and generating medical texts by LMs (Amin-Nejad et al., 2020;Libbi et al., 2021;Kagawa et al., 2021;Wang and Sun, 2022b), none has been done for synthetic EHRs generation.Prompt learning was used to control the topic of text generation (Li and Liang, 2021;Yu et al., 2021;Qian et al., 2022).However, they only consider one-hot encoded topics as prefix.In this work, we leverage prompt learning for EHRs generation conditioned on patient baseline features, which include both categorical and numerical values.

Methods
In this section, we elaborate on the main framework of PromptEHR, including the problem setting, workflow, and training tasks formulation.Next, we discuss the strategies for generating diverse synthetic EHRs with minor loss of quality.Then, we present the recipe proposed for the evaluation for both quality and privacy-preserving ability of the EHRs generation models.

Problem Formulation
Consider there are N patients where the n-th patient is represented by X n,1:Tn = {x n ; x n,1 , x n,2 , . . ., x n,Tn } where x n are the baseline features, e.g., age and gender; x n,t signifies events happened at the t-th visit; T n is the total number of visits.For each visit x n,t , we have K types of events as x n,t = {x 1 n,t , x 2 n,t , . . ., x K n,t }. x k n,t = {c 1 , c 2 , . . ., c l } are all events of type k, l is the number of events.
We formulate three basic functions to support EHRs generation:

Causal language modeling
Figure 2: The workflow of PromptEHR.The input longitudinal events are transformed to the code sequence by special tokens, e.g., <v> and </v> cover events in the same visit; <dx> and </dx> cover contemporary diagnosis events.Baseline features are encoded to prompt embeddings by two featurizers then add to the token embeddings.
The model decodes autoregressively and is trained with causal language modeling loss.
• Longitudinal imputation: given historical visits X n,1:t = {x n,1 , . . ., x n,t }, the model predicts the events in next visit as x n,t+1 ; • Cross-modality imputation: given visits with K − 1 types of events x n,t \ {x k n,t }, the model predicts the events belonging to modality k; • Conditional generation: given historical visits X n,1:t and the baseline features x n , the model makes further predictions.
These functions can be combined to synthesize EHRs from the existing partial EHRs with baseline features or from scratch.

Encoding
The overview is shown by Fig. 2. The first step is to transform the raw inputs X n,1:Tn to token sequences hence acceptable to the encoder.
Inputs tokenization.PromptEHR is compatible with all sequence-to-sequence models (Cho et al., 2014).We choose to utilize BART (Lewis et al., 2020) as the base model.BART uses a bidirectional encoder thus allowing arbitrary corruption for the input sequences and a left-to-right decoder to reconstruct the inputs.Motivated by the application of prompts in language (Liu et al., 2021a), we leverage prompts to specify the inputs.Without loss of generality, we assume two modalities: diagnosis (DX) and medication (Med).Denote [X] and [Z] as the input and answer slots, we can formulate the longitudinal imputation task by a prefix prompt problem: <v>[X]</v> [Z].The model tries to fill the answer slot [Z] which are the events in the next visit; the cross-modal imputation task is built by a cloze prompt problem: [X]<dx> [Z] where <dx> signifies the start of diagnosis events and [X] represents the multimodal context events.
Conditional prompt featurizer.We introduce conditional prompt embeddings to enable conditional generation based on patient features.We consider both categorical x cat and numerical features x num .The categorical prompt embeddings E cat is obtained by (1) Therefore, e cat encodes the instruction of x cat and steers the LM to generate specific populations.We transform x num ∈ R mu to e num with another set of W 0 , W 1 , and b.E cat and E num then prepend to token embeddings by to serve as the inputs to the encoder.We build the inputs for the decoder with the other featurizer to get E ′ cat and E ′ num and the shared token embeddings E tok .

Decoding & Training
The inputs tokens for the decoder are shifted encoder inputs such that the decoder predicts the next token based on the prior tokens.Denote the context by X and the target event by x, the true conditional distribution is p(x|X).For instance, in the longitudinal imputation task, the context is the historical record of the patient X 1:t and the target is the events in the next visit x t+1 .Correspondingly, p(x|X; θ) is the prediction made by the model.We use X ∼ q(X) to represent the perturbations added to the context inputs.The training objective is to minimize the negative log-likelihood as (3) The model is hence pushed to maximize the predicted probability to the true next tokens x conditioned by the corrupted inputs X.
We apply the following corruptions during training: (1) Token mask, infill, and deletion; (2) Span shuffle and permutation.For (1), we randomly replace multiple tokens with <mask> or delete as length ∼ Poisson(3).For (2), we randomly shuffle the tokens within the same visits and shuffle the modality orders in the same visits.

Harmless Randomness in Generation
Apart from preciseness, the diversity of the generated data is also of great importance.PromptEHR samples from the conditional distribution by which allows to adjust diversity by many techniques existing in natural language generation literature.For instance, to prevent low probability events, we can apply top-k sampling (Fan et al., 2018).Temperature is also useful to flatten or sharpen the conditional distribution.More advanced methods, e.g., beam search (Welleck et al., 2019) and nucleus sampling (Holtzman et al., 2019) are all available for exploitation by PromptEHR, which brings a great potential to achieve higher quality EHRs with diversity.By contrast, GANs & VAEs depend on sampling random noise vectors to introduce diversity, which is not controllable and usually undermines generation quality.

Quality Evaluation
We provide a recipe to evaluate EHRs generation on two dimensions: accuracy and privacy.For accuracy, we propose to adopt perplexity which is usually used in the text generation task, defined by the exponent of the average negative log-likelihood (NLL) per word (Neubig, 2017): where p(v l |v 1:l−1 ) indicates how the model predicts the next word using all previous words as the context; L is the length of the document; θ is the model parameter.Intuitively, a random predictor will produce ppl that is equal to the cardinality of vocabulary |C|.We hereby adapt it to the longitudinal imputation perplexity (lpl) and cross-modality imputation perplexity (mpl) taking the structure of EHR into account.lpl takes the temporal coherence of the patient visits into account.For instance, chronic diseases like diabetes can cause complications (e.g., heart disease and kidney failure) in the future.Following Eq. ( 5), we can write the lpl of a patient's records Here, x t = {c 1 , . . ., c lt } are all events during the tth admission.Inside this admission, concurrent events are independently generated conditioned on previous visits, therefore we can decompose p(x t |x 1:t−1 ; θ) = lt l=1 p(c l |x 1:t−1 ; θ) then come to the results.
mpl accounts for the correlations between modalities.For example, high body temperature in lab test may correspond to fever in diagnosis.We focus on the t-th admission where the joint distribution of all K modalities p(x 1 t , . . ., x K t |x 1:t−1 ; θ).We can write the NLL here by (8)

Privacy Evaluation
It is crucial to measure the privacy preserving when sharing the synthetic data.We try to evaluate two privacy risks: membership inference and attribute inference.We split the data into the training data D 1 = {X n,1:Tn } N n=1 and testing data D 2 , and generate synthetic data D S with the same length as D 1 .
Membership Inference.Attackers would try to infer the membership of the patient records based on the real records they own.We design this adversary based on shadow training (Shokri et al., 2017).In the first stage, a shadow model M sd is trained on D S .It tries to mimic the performance of the generation model in longitudinal inference.
In the second stage, a membership inference dataset is built based on M sd (X) where X ∈ D S D 2 .D S is a subset of D S with the same number as We will then evaluate the success rate of M mi on identifying X ∈ D 1 D 2 .The better the adversary M sd (X) and M mi perform on this evaluation, the higher the privacy risk caused by releasing the synthetic EHRs.
Attribute Inference.We build this adversary following (Zhang et al., 2021).In this case, attackers hold some incomplete real records where several sensitive attributes are missing.They would take advantage of the synthetic data to infer these attributes.Besides, attackers also hold the prior knowledge of association between the attributes, i.e., given the incomplete individual records, how probable another code appears in expectation or P 0 = p(v l |{v 1 , . . ., v lt } T t=1 \v l ).With the prior, the attacker will train an attribute imputation model on the synthetic data D S , i.e., P = p(v l |{v 1 , . . ., v lt } T t=1 \ v l ; θ I ).The attacker then believe the code v l exists when log P − log P 0 ≥ δ. δ is a pre-defined threshold.In experiments, we train another attribute imputation model on D 1 to approximate the prior knowledge.We evaluate the success rate of this attack.Besides, we create a control arm where another imputation model is trained on the test set.Comparison between the control and the treatment (imputation model trained on D S ) suffices for an immediate evaluation of the synthetic data's risk level.

Experiments
In this section, we designed experiments to answer the following questions.
• Q1.How well does PromptEHR perform for EHRs generation compared with the state-of-theart methods on generation quality?  1.
Baselines.We compare the following baselines: • LSTM+MLP.This is the baseline that leverages LSTM (Hochreiter and Schmidhuber, 1997) to learn the patient state thus extracting the temporal visit patterns.Based on the state embeddings, MLP layers are able to impute the probability of events within the visit or for the next visit.
• LSTM+MedGAN (Choi et al., 2017).The original MedGAN is not able to do conditional generation and temporal inference.Similar to the first baseline, LSTM is used for capturing temporal patterns as the inputs for MedGAN.Then, the generator of MedGAN will try to make conditional generation for records as realistic as possible to fool its discriminator.
• SynTEG (Zhang et al., 2021).This is one of the most recent EHRs generation methods.It also consists of a state embedding module and a imputation module.It utilizes transformers (Vaswani et al., 2017) for temporal dependency learning and conditional Wasserstein    GAN with gradient penalty (WGAN-GP) (Arjovsky et al., 2017;Gulrajani et al., 2017) for event inference.
• GPT-2 (Radford et al., 2019).We pick GPT-2 as the LM baseline that only does causal language modeling on EHRs.Then, it is able to do event generation like texts generation.

Evaluation metrics
We use the proposed lpl and mpl to evaluate generation quality.Since perplexity of different patient records vary significantly, we take the median of perplexity across patients for the sake of stability of the performance estimate.We use two adversaries: membership inference (MI) and attribute inference (AI), to test the privacy risk.In MI, we use LSTM+MLP as the shadow model to mimic the outputs of PromptEHR.A threelayer MLP predicts the membership.ROC curve is plotted to evaluate the attack success rate; In AI, we train an LSTM+MLP on D 1 to approximate the prior and another LSTM+MLP on D S as the attribute imputation model.To test the utility of the synthetic data for downstream predictive tasks, we train LSTM+MLP on D S or D 2 and test it on D 2 to compute the recall@20/30.

Implementation Details
All the used LSTM+MLP model consists of a three-layer bi-directional LSTM with 128 hidden dimensions with one 256-dim MLP layer.It is trained with 1e-4 learning rate by Adam optimizer (Kingma and Ba, 2014).The 12-layer transformer based pre-trained GPT-2 is trained with 1e-5 learning rate and 1e-4 weight decay by Adam.We follow the architecture and training protocol from the original papers of MedGAN and SynTEG.
For PromptEHR, we use BART model as the backbone (Lewis et al., 2020).We use Adam by setting learning rate as 1e-5, weight decay as 1e-4, batch size as 16.The total training epoch is 50 where the first 3 epochs are warm-up steps.During the training stage, the perplexity computed on the validation set is used to pick the best checkpoint.All experiments are conducted with an RTX-3090 GPU, 251 GB RAM, and AMD Ryzen Threadripper 3970X 32-core CPU.

Q1. Generation Quality
The calculated mpl and lpl of all show in Table 2.It is witnessed that PromptEHR obtains the best result among all methods.On the contrary, LSTM+MedGAN and SynTEG do not gain better test perplexity than the basic LSTM+MLP.The main reason is that their GAN part takes a noise input except for the learned temporal state embeddings to make conditional generation.GPT-2 works better than LSTM+MLP on temporal perplexity crediting to its power in capturing series pattern through transformers.
Most methods obtain better mpl than lpl.It 125.1 ± 5.3 122.9 ± 2.0 40.3 ± 1.7 43.8 ± 0.9 173.3 ± 1.9 169.5 ± 0.5 68.9 ± 0.3 71.3 ± 0.5 LSTM+MedGAN 169.2 ± 6.0 109.8 ± 3.1 54.4 ± 2.5 40.1 ± 1.4 197.3 ± 2.5 166.7 ± 0.9 76.9 ± 0.3 66.2 ± 0.2 SynTEG 130.4 ± 4.6 130.0 ± 2.6 46.4 ± 1.8 46.2 ± 1.5 175.6 ± 2.0 175.4 ± 0.9 69.5 ± 0.2 69.6 ± 0.3 GPT-2 121.1 ± 1.8 134.2 ± 0.9 38.7 ± 0.9 48.2 ± 0.5 166.4 ± 1.8 169.6 ± 0.6 69.7 ± 0.1 69.6 ± 0.1 PromptEHR 65.9 ± 2.0 67.7 ± 0.6 13.5 ± 0.8 10.1 ± 0.  is intuitive because models know the additional in-visit information from the other modalities for the target modality imputation, thus making better predictions.However, GPT-2 performs worse on mpl than on lpl.GPT-2 is trained by causal language modeling task where it models the sequence autoregressively.Without the prompt design, it is confused by the order of events within the same visit, which induces deteriorating performance.Fig. 3 demonstrates the comparison made between generation w/ and w/o conditional prompts for PromptEHR.We identify that conditional prompts significantly improve the generation quality as they provide important characteristics of the patients.We are hence able to generate for specific populations with input prompts.ence attack based on shadow training with the varying threshold δ.Here, we cut the curve where δ = 4 because all the remaining curves are approaching zero on its right.The threshold δ adjusts to the confidence level of the attacker, i.e., the smaller δ is set, the higher probability that the AI is correct we believe.When δ = 0, so long as the AI inference probability P (v l ) is larger than the prior P 0 (v l ), the AI model will believe the attribute v l exists.In this scenario, both two models have a high FPR of around 0.6, but the TPR of PromptEHR is only near half of the control model.The TPR then keeps a much lower level when δ increases, which implies the low attribute leakage risk of the synthetic data generated by PromptEHR.Although the FPR becomes smaller than Control when δ > 0.8, the TPR of PromptEHR is approaching zero after that.That means, being conservative for PromptEHR avoids inferring some wrong attributes but loses the ability to specify the right attributes at the same time.In a nutshell, the synthetic data by PromptEHR has a low risk to leak the attribute information.

Q3. Synthetic EHRs Utility
We aim to measure the utility of synthetic data when we develop predictive models on top of them.We compare LSTM models on D S and D 1 with multilabel prediction for diagnosis events similar to the setting in (Choi et al., 2016b).In particular, we design two experiments: (1) train LSTM on   fully synthetic data and compare its performance with the one trained on real data; (2) train LSTM on a mixture of synthetic data and real data where the synthetic data is regarded as data augmentation.
Fully synthetic data.We test the LSTM performance on 5k, 10k, 30k, and 50k synthetic patient records.For comparison, the model performance on 5k and 10k real records are also tested.Results are shown in Fig. 5.For recall@10 in Fig. 5a, we can observe that though 10k synthetic records are not comparable to 5k real records, 30k synthetic records can reach a better performance than 10k real records.On the other hand, for recall@20 in Fig. 5b, we surprisingly find that 5k synthetic records achieve the same performance as the 5k real records.With more synthetic records involved, the 50k synthetic records-based LSTM outperforms its counterpart on 10k real records at last.This experiment demonstrates that synthetic EHRs by PromptEHR are sufficient to support healthcare applications.It is expected to achieve comparable performance by synthetic data as the real data.Hybrid synthetc-real data.In Fig. 6, we randomly sample 10k real data from D 1 and combine them with different sizes of synthetic data from D S .We find that the model trained on the augmented hybrid data has obvious advantages over its counterpart on the real data.With more synthetic records involved, the model gains better performance.This demonstrates the utility of synthetic data used as augmentation in low-resource cases.Besides, from Fig. 6 we identify this hybrid data is still inferior to the model trained on all real records.So we are curious about how many synthetic and real data we need to outperform this seemingly performance upper bound.In other words, can we beat the real data with the synthetic data?
We conduct the next experiment where 30k real data is combined with synthetic data.Note that we have around 40k real training records in total.Results are shown in Fig. 7.It can be seen that 50k synthetic records plus 30k real records train better models than on all the real data.

Q4. Quality w.r.t. Training Size
In practice, the original data source to be shared might be in limited size, which elicits a question on how much the generation quality of PromptEHR is influenced by the size of the training cohort.To answer this question, we sampled 5k, 10k, and 20k patient records from the training set and testify the perplexity of the learned PromptEHR.Results are illustrated by Fig. 8.We plot the performance of the baseline LSTM+MLP method trained on all real training records (∼40k) in red dotted lines for comparison.It shows that PromptEHR trained on 5k training records has worse generation quality than the baseline.When additional 5k records are involved, PromptEHR not only outperforms the LSTM baseline but also all other baselines reported in Table 2, which demonstrates that PromptEHR is amenable to low resources and superior than the baselines.

Case Study
We demonstrate two use cases of PromptEHR: generating from scratch (Table 3) and generating by completion (Table 4).While previous works handle the former, only PromptEHR handles the completion setting because it makes flexible conditional generation based on either patient features or previous events.In Table 4, our model begins from all diagnosis of one patient and then generates labtests via cross-modal imputation.Then, we randomly sample one procedure and let the model impute all the remaining procedures based on diagnosis and the labtests.Iteratively applying this strategy yields diverse and realistic EHRs via conditional generation.We provide explanations of the two synthetic records in Appendix §A.

Conclusion
In this paper, we study how to leverage real EHRs to train a prompt learning based generative language model for synthetic EHRs generation, namely PromptEHR.Unlike previous EHRs generation methods, PromptEHR is able to learn from and generate heterogeneous EHRs.To evaluate its performance, we draw the idea of perplexity from the text generation literature and propose two per-plexity measures: spatial and temporal perplexity.Experiments on MIMIC-III data demonstrates the quality of generated EHRs are better than the baselines.The synthetic data provides both utility and privacy for downstream healthcare applications.

Limitations
This work seeks to generate synthetic records hence avoid sharing sensitive personal electronic healthcare records for the development of machine learning models.In our experiments, we find the generated synthetic records by PromptEHR are invulnerable to two adversaries: membership inference and attribute inference.However, there is still possibility that there exists some more advanced attacking methods which can take the advantage of synthetic records.Obviously we cannot exhaust all adversaries for empirical privacy evaluation.In this viewpoint, it is promising to investigate theoretic-guaranteed EHRs generation approach.For instance, we may draw the idea of differential privacy to enhance the current method to provide a complete privacy protection.
7) where l k t indicates the number codes belonging the k-th modality.Next, we can track all admissions to obtain the final definition of mpl by mpl = e T t=1 NLLt/T .

Figure 3 :
Figure3: Perplexity compared between generation w/ (cond.)and w/o conditional prompts (w/o cond.) for four types of events.Note that both lpl and mpl are the less the better.

Figure 4 :
Figure 4: Privacy-preserving evaluation on membership inference (left) and attribute inference (right) adversaries.On the right, the PromptEHR curves indicate the results of attribute inference model trained on the synthetic data D S by PromptEHR; the Control curves indicate the one trained on test set D 2 .

Figure 5 :
Figure 5: Recall@10/20 of the predictive model on the test set with varying input data size: syn indicates the model trained on fully synthetic data; real-5k/10k indicate trained on 5k/10k real data.Error bars show the 95% confidence interval which also appear in the following figures.

Figure 6 :
Figure 6: Recall of the predictive model on the test set with varying input data size: syn+real-10k indicates the model trained on the hybrid of synthetic & 10k real data; real-10k/all indicate trained on 10k/all real data.

4. 4
Q2. Privacy EvaluationWe test the privacy preserving ability of the generated synthetic EHRs by applying membership and attribute inference attacks.Results are illustrated by Fig.4.Fig.4ademonstrates the ROC curve consisting true positive rate (TPR) and false positive rate (FPR) of the membership inference on D 1 D 2 .It clearly shows the MI model has bad performance that is near random guess (AUC ≃ 0.5), which means the MI attack gains no sensitive membership information when trained on the synthetic data D S .Fig.4bshows the TPR/FPR of attribute infer-

Figure 7 :
Figure 7: Recall of the predictive model on the test set with varying input data size: syn+real-30k indicates the model trained on the hybrid of synthetic & 30k real data; real-30k/all indicate trained on 30k/all real data.

Figure 8 :
Figure 8: Black solid lines show the spatial and temporal perplexities of PromptEHR with regard to varying input training record sizes.Red dotted lines show the lpl and mpl of baseline LSTM+MLP trained on all training records (∼40k).

Table 1 :
Statistics of the used MIMIC-III data.

Table 2 :
Longitudinal imputation perplexity (lpl) & cross-modality imputation perplexity (mpl) of models on different kinds of events.Best values are in bold.± value indicates the 95% confidence interval.