DPGA-TextSyn: Differentially Private Genetic Algorithm for Synthetic Text Generation

Zhonghao Sun; Zhiliang Tian; Yiping Song; Yuyi Si; Juhua Zhang; Minlie Huang; Kai Lu; Zeyu Xiong; Xinwang Liu; Dongsheng Li

doi:10.18653/v1/2025.findings-acl.831

DPGA-TextSyn: Differentially Private Genetic Algorithm for Synthetic Text Generation

Zhonghao Sun, Zhiliang Tian, Yiping Song, Yuyi Si, Juhua Zhang, Minlie Huang, Kai Lu, Zeyu Xiong, Xinwang Liu, Dongsheng Li

Abstract

Using large language models (LLMs) has a potential risk of privacy leakage since the data with sensitive information may be used for fine-tuning the LLMs. Differential privacy (DP) provides theoretical guarantees of privacy protection, but its practical application in LLMs still has the problem of privacy-utility trade-off. Researchers synthesized data with strong generation capabilities closed-source LLMs (i.e., GPT-4) under DP to alleviate this problem, but this method is not so flexible in fitting the given privacy distributions without fine-tuning. Besides, such methods can hardly balance the diversity of synthetic data and its relevance to target privacy data without accessing so much private data. To this end, this paper proposes DPGA-TextSyn, combining general LLMs with genetic algorithm (GA) to produce relevant and diverse synthetic text under DP constraints. First, we integrate the privacy gene (i.e., metadata) to generate better initial samples. Then, to achieve survival of the fittest and avoid homogeneity, we use privacy nearest neighbor voting and similarity suppression to select elite samples. In addition, we expand elite samples via genetic strategies such as mutation, crossover, and generation to expand the search scope of GA. Experiments show that this method significantly improves the performance of the model in downstream tasks while ensuring privacy.

Anthology ID:: 2025.findings-acl.831
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 16159–16179
Language:
URL:: https://aclanthology.org/2025.findings-acl.831/
DOI:: 10.18653/v1/2025.findings-acl.831
Bibkey:
Cite (ACL):: Zhonghao Sun, Zhiliang Tian, Yiping Song, Yuyi Si, Juhua Zhang, Minlie Huang, Kai Lu, Zeyu Xiong, Xinwang Liu, and Dongsheng Li. 2025. DPGA-TextSyn: Differentially Private Genetic Algorithm for Synthetic Text Generation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 16159–16179, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: DPGA-TextSyn: Differentially Private Genetic Algorithm for Synthetic Text Generation (Sun et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.831.pdf

PDF Cite Search Fix data