Xingliang Yuan
2025
NAP2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human
Shuo Huang
|
William Maclean
|
Xiaoxi Kang
|
Qiongkai Xu
|
Zhuang Li
|
Xingliang Yuan
|
Gholamreza Haffari
|
Lizhen Qu
Findings of the Association for Computational Linguistics: EMNLP 2025
The widespread use of cloud-based Large Language Models (LLMs) has heightened concerns over user privacy, as sensitive information may be inadvertently exposed during interactions with these services. To protect privacy before sending sensitive data to those models, we suggest sanitizing sensitive text using two common strategies used by humans: i) deleting sensitive expressions, and ii) obscuring sensitive details by abstracting them. To explore the issues and develop a tool for text rewriting, we curate the first corpus, coined , through both crowdsourcing and the use of large language models (LLMs). Compared to the prior works based on differential privacy, which lead to a sharp drop in information utility and unnatural texts, the human-inspired approaches result in more natural rewrites and offer an improved balance between privacy protection and data utility, as demonstrated by our extensive experiments.
Zero-Shot Privacy-Aware Text Rewriting via Iterative Tree Search
Shuo Huang
|
Xingliang Yuan
|
Gholamreza Haffari
|
Lizhen Qu
Findings of the Association for Computational Linguistics: EMNLP 2025
The increasing adoption of large language models (LLMs) in cloud-based services has raised significant privacy concerns, as user inputs may inadvertently expose sensitive information. Existing text anonymization and de-identification techniques, such as rule-based redaction and scrubbing, often struggle to balance privacy preservation with text naturalness and utility. In this work, we propose a zero-shot, tree-search-based iterative sentence rewriting algorithm that systematically obfuscates or deletes private information while preserving coherence, relevance, and naturalness. Our method incrementally rewrites privacy-sensitive segments through a structured search guided by a reward model, enabling dynamic exploration of the rewriting space. Experiments on privacy-sensitive datasets show that our approach significantly outperforms existing baselines, achieving a superior balance between privacy protection and utility preservation.
2023
FaLA: Fast Linear Adaptation for Replacing Backbone Models on Edge Devices
Shuo Huang
|
Lizhen Qu
|
Xingliang Yuan
|
Chunyang Chen
Findings of the Association for Computational Linguistics: EMNLP 2023
In this work, we study the language model backbone replacement problem for personalized downstream tasks in a non-stationary on-device scenario. In real world, company may periodically update the knowledge and architectures of backbones to keep the competitive in the market, meanwhile, to accommodate the users’ own preference, models are personalized to fit users’ own distribution locally. Traditional full model tuning or transfer learning for such replacements often incur considerable local device training costs and necessitate extensive backpropagation within deep transformer layers. Addressing this issue, we propose a novel, lightweight tuning method for personalized NLP classification tasks post-backbone replacement. Our approach leverages a personalized matrix calculated from documents corresponding to users’ old and new backbones. This matrix facilitates top-layer parameter tuning, drastically reducing backpropagation computation. To further mitigate training costs associated with matrix linear optimization, we employ correlation clustering to curate a few examples from personalized cluster sets for individuals. Our method achieves over 1000 times computation reduction in Flops for backpropagation and brings the user-specific initialization for personal matrix yielding significant performance boost compared with popular transfer learning methods.
Search
Fix author
Co-authors
- Shuo Huang (黄硕) 3
- Lizhen Qu 3
- Gholamreza Haffari 2
- Chunyang Chen 1
- Xiaoxi Kang 1
- show all...