NAP2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human

Shuo Huang (黄硕); William Maclean; Xiaoxi Kang; Qiongkai Xu; Zhuang Li; Xingliang Yuan; Gholamreza Haffari; Lizhen Qu

doi:10.18653/v1/2025.findings-emnlp.476

NAP2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human

Shuo Huang, William Maclean, Xiaoxi Kang, Qiongkai Xu, Zhuang Li, Xingliang Yuan, Gholamreza Haffari, Lizhen Qu

Abstract

The widespread use of cloud-based Large Language Models (LLMs) has heightened concerns over user privacy, as sensitive information may be inadvertently exposed during interactions with these services. To protect privacy before sending sensitive data to those models, we suggest sanitizing sensitive text using two common strategies used by humans: i) deleting sensitive expressions, and ii) obscuring sensitive details by abstracting them. To explore the issues and develop a tool for text rewriting, we curate the first corpus, coined , through both crowdsourcing and the use of large language models (LLMs). Compared to the prior works based on differential privacy, which lead to a sharp drop in information utility and unnatural texts, the human-inspired approaches result in more natural rewrites and offer an improved balance between privacy protection and data utility, as demonstrated by our extensive experiments.

Anthology ID:: 2025.findings-emnlp.476
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8954–8970
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.476/
DOI:: 10.18653/v1/2025.findings-emnlp.476
Bibkey:
Cite (ACL):: Shuo Huang, William Maclean, Xiaoxi Kang, Qiongkai Xu, Zhuang Li, Xingliang Yuan, Gholamreza Haffari, and Lizhen Qu. 2025. NAP2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 8954–8970, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: NAP2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human (Huang et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.476.pdf
Checklist:: 2025.findings-emnlp.476.checklist.pdf

PDF Cite Search Checklist Fix data