LatticeGen: Hiding Generated Text in a Lattice for Privacy-Aware Large Language Model Generation on Cloud

Mengke Zhang, Tianxing He, Tianle Wang, Lu Mi, Niloofar Mireshghallah, Binyi Chen, Hao Wang, Yulia Tsvetkov


Abstract
In the current user-server interaction paradigm of prompted generation with large language models (LLMs) on cloud, the server fully controls the generation process, which leaves zero options for users who want to keep the generated text private to themselves. For privacy-aware text generation on cloud, we propose LatticeGen, a cooperative protocol in which the server still handles most of the computation while the client controls the sampling operation. The key idea is that the true generated sequence is mixed with noise tokens by the client and hidden in a noised lattice. Only the client knows which tokens are the true ones. Considering potential attacks from a hypothetically malicious server and how the client can defend against it, we propose the repeated beam-search attack and the mixing noise scheme. In our experiments we apply LatticeGen to protect both prompt and generation. It is shown that while the noised lattice degrades generation quality, LatticeGen successfully protects the true generation to a remarkable degree under strong attacks (more than 50% of the semantic remains hidden as measured by BERTScore).
Anthology ID:
2024.findings-naacl.171
Volume:
Findings of the Association for Computational Linguistics: NAACL 2024
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2674–2690
Language:
URL:
https://aclanthology.org/2024.findings-naacl.171
DOI:
Bibkey:
Cite (ACL):
Mengke Zhang, Tianxing He, Tianle Wang, Lu Mi, Niloofar Mireshghallah, Binyi Chen, Hao Wang, and Yulia Tsvetkov. 2024. LatticeGen: Hiding Generated Text in a Lattice for Privacy-Aware Large Language Model Generation on Cloud. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 2674–2690, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
LatticeGen: Hiding Generated Text in a Lattice for Privacy-Aware Large Language Model Generation on Cloud (Zhang et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-naacl.171.pdf