DYNTEXT: Semantic-Aware Dynamic Text Sanitization for Privacy-Preserving LLM Inference

Juhua Zhang; Zhiliang Tian; Minghang Zhu; Yiping Song; Taishu Sheng; Siyi Yang; Qiunan Du; Xinwang Liu; Minlie Huang; Dongsheng Li

doi:10.18653/v1/2025.findings-acl.1038

DYNTEXT: Semantic-Aware Dynamic Text Sanitization for Privacy-Preserving LLM Inference

Juhua Zhang, Zhiliang Tian, Minghang Zhu, Yiping Song, Taishu Sheng, Siyi Yang, Qiunan Du, Xinwang Liu, Minlie Huang, Dongsheng Li

Abstract

LLMs face privacy risks when handling sensitive data. To ensure privacy, researchers use differential privacy (DP) to provide protection by adding noise during LLM training. However, users may be hesitant to share complete data with LLMs. Researchers follow local DP to sanitize the text on the user side and feed non-sensitive text to LLMs. The sanitization usually uses a fixed non-sensitive token list or a fixed noise distribution, which induces the risk of being attacked or semantic distortion. We argue that the token’s protection level should be adaptively adjusted according to its semantic-based information to balance the privacy-utility trade-off. In this paper, we propose DYNTEXT, an LDP-based Dynamic Text sanitization for privacy-preserving LLM inference, which dynamically constructs semantic-aware adjacency lists of sensitive tokens to sample non-sensitive tokens for perturbation. Specifically, DYNTEXT first develops a semantic-based density modeling under DP to extract each token’s density information. We propose token-level smoothing sensitivity by combining the idea of global sensitivity (GS) and local sensitivity (LS), which dynamically adjusts the noise scale to avoid excessive noise in GS and privacy leakage in LS. Then, we dynamically construct an adjacency list for each sensitive token based on its semantic density information. Finally, we apply the replacement mechanism to sample non-sensitive, semantically similar tokens from the adjacency list to replace sensitive tokens. Experiments show that DYNTEXT excels strong baselines on three datasets.

Anthology ID:: 2025.findings-acl.1038
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 20243–20255
Language:
URL:: https://aclanthology.org/2025.findings-acl.1038/
DOI:: 10.18653/v1/2025.findings-acl.1038
Bibkey:
Cite (ACL):: Juhua Zhang, Zhiliang Tian, Minghang Zhu, Yiping Song, Taishu Sheng, Siyi Yang, Qiunan Du, Xinwang Liu, Minlie Huang, and Dongsheng Li. 2025. DYNTEXT: Semantic-Aware Dynamic Text Sanitization for Privacy-Preserving LLM Inference. In Findings of the Association for Computational Linguistics: ACL 2025, pages 20243–20255, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: DYNTEXT: Semantic-Aware Dynamic Text Sanitization for Privacy-Preserving LLM Inference (Zhang et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.1038.pdf

PDF Cite Search Fix data