TextMixer: Mixing Multiple Inputs for Privacy-Preserving Inference

Xin Zhou, Yi Lu, Ruotian Ma, Tao Gui, Qi Zhang, Xuanjing Huang


Abstract
Pre-trained language models (PLMs) are often deployed as cloud services, enabling users to upload textual data and perform inference remotely. However, users’ personal text often contains sensitive information, and sharing such data directly with the service providers can lead to serious privacy leakage. To address this problem, we introduce a novel privacy-preserving inference framework called MixPi , which prevents plaintext leakage during the inference phase. Inspired by k-anonymity, MixPi aims to obfuscate a user’s private input by mixing it with multiple other inputs, thereby confounding potential privacy attackers. To achieve this, our approach involves: (1) proposing a novel encryption module, Privacy Mixer, which encrypts input from three distinct dimensions: mixing, representation, and position. (2) adopting a pre-trained Multi-input Multi-output network to handle mixed representations and obtain multiple predictions. (3) employing a Privacy Demixer to ensure only the user can decrypt the real output among the multiple predictions. Furthermore, we explore different ways to automatically generate synthetic inputs required for mixing. Experimental results on token and sentence classification tasks demonstrate that MixPi greatly surpasses existing privacy-preserving methods in both performance and privacy.
Anthology ID:
2023.findings-emnlp.244
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3749–3762
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.244
DOI:
10.18653/v1/2023.findings-emnlp.244
Bibkey:
Cite (ACL):
Xin Zhou, Yi Lu, Ruotian Ma, Tao Gui, Qi Zhang, and Xuanjing Huang. 2023. TextMixer: Mixing Multiple Inputs for Privacy-Preserving Inference. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 3749–3762, Singapore. Association for Computational Linguistics.
Cite (Informal):
TextMixer: Mixing Multiple Inputs for Privacy-Preserving Inference (Zhou et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.244.pdf