Yuran Wang


2023

pdf bib
TextObfuscator: Making Pre-trained Language Model a Privacy Protector via Obfuscating Word Representations
Xin Zhou | Yi Lu | Ruotian Ma | Tao Gui | Yuran Wang | Yong Ding | Yibo Zhang | Qi Zhang | Xuanjing Huang
Findings of the Association for Computational Linguistics: ACL 2023

In real-world applications, pre-trained language models are typically deployed on the cloud, allowing clients to upload data and perform compute-intensive inference remotely. To avoid sharing sensitive data directly with service providers, clients can upload numerical representations rather than plain text to the cloud. However, recent text reconstruction techniques have demonstrated that it is possible to transform representations into original words, suggesting that privacy risk remains. In this paper, we propose TextObfuscator, a novel framework for protecting inference privacy by applying random perturbations to clustered representations. The random perturbations make the representations indistinguishable from surrounding clustered representations, thus obscuring word information while retaining the original word functionality. To achieve this, we utilize prototypes to learn clustered representation, where tokens of similar functionality are encouraged to be closer to the same prototype during training. Additionally, we design different methods to find prototypes for token-level and sentence-level tasks, which can improve performance by incorporating semantic and task information. Experimental results on token and sentence classification tasks show that TextObfuscator achieves improvement over compared methods without increasing inference cost.

2022

pdf bib
TextFusion: Privacy-Preserving Pre-trained Model Inference via Token Fusion
Xin Zhou | Jinzhu Lu | Tao Gui | Ruotian Ma | Zichu Fei | Yuran Wang | Yong Ding | Yibo Cheung | Qi Zhang | Xuanjing Huang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Recently, more and more pre-trained language models are released as a cloud service. It allows users who lack computing resources to perform inference with a powerful model by uploading data to the cloud. The plain text may contain private information, as the result, users prefer to do partial computations locally and upload intermediate representations to the cloud for subsequent inference. However, recent studies have shown that intermediate representations can also be recovered to plain text with reasonable accuracy, thus the risk of privacy leakage still exists. To address this issue, we propose TextFusion, a novel method for preserving inference privacy. Specifically, we train a Fusion Predictor to dynamically fuse token representations, which hides multiple private token representations behind an unrecognizable one. Furthermore, an adversarial training regime is employed to privatize these representations. In this way, the cloud only receives incomplete and perturbed representations, making it difficult to accurately recover the complete plain text. The experimental results on diverse classification tasks show that our approach can effectively preserve inference privacy without significantly sacrificing performance in different scenarios.