William Maclean
2025
NAP2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human
Shuo Huang
|
William Maclean
|
Xiaoxi Kang
|
Qiongkai Xu
|
Zhuang Li
|
Xingliang Yuan
|
Gholamreza Haffari
|
Lizhen Qu
Findings of the Association for Computational Linguistics: EMNLP 2025
The widespread use of cloud-based Large Language Models (LLMs) has heightened concerns over user privacy, as sensitive information may be inadvertently exposed during interactions with these services. To protect privacy before sending sensitive data to those models, we suggest sanitizing sensitive text using two common strategies used by humans: i) deleting sensitive expressions, and ii) obscuring sensitive details by abstracting them. To explore the issues and develop a tool for text rewriting, we curate the first corpus, coined , through both crowdsourcing and the use of large language models (LLMs). Compared to the prior works based on differential privacy, which lead to a sharp drop in information utility and unnatural texts, the human-inspired approaches result in more natural rewrites and offer an improved balance between privacy protection and data utility, as demonstrated by our extensive experiments.
Search
Fix author
Co-authors
- Gholamreza Haffari 1
- Shuo Huang (黄硕) 1
- Xiaoxi Kang 1
- Zhuang Li 1
- Lizhen Qu 1
- show all...