TextHide: Tackling Data Privacy in Language Understanding Tasks

Yangsibo Huang, Zhao Song, Danqi Chen, Kai Li, Sanjeev Arora


Abstract
An unsolved challenge in distributed or federated learning is to effectively mitigate privacy risks without slowing down training or reducing accuracy. In this paper, we propose TextHide aiming at addressing this challenge for natural language understanding tasks. It requires all participants to add a simple encryption step to prevent an eavesdropping attacker from recovering private text data. Such an encryption step is efficient and only affects the task performance slightly. In addition, TextHide fits well with the popular framework of fine-tuning pre-trained language models (e.g., BERT) for any sentence or sentence-pair task. We evaluate TextHide on the GLUE benchmark, and our experiments show that TextHide can effectively defend attacks on shared gradients or representations and the averaged accuracy reduction is only 1.9%. We also present an analysis of the security of TextHide using a conjecture about the computational intractability of a mathematical problem.
Anthology ID:
2020.findings-emnlp.123
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2020
Month:
November
Year:
2020
Address:
Online
Editors:
Trevor Cohn, Yulan He, Yang Liu
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1368–1382
Language:
URL:
https://aclanthology.org/2020.findings-emnlp.123
DOI:
10.18653/v1/2020.findings-emnlp.123
Bibkey:
Cite (ACL):
Yangsibo Huang, Zhao Song, Danqi Chen, Kai Li, and Sanjeev Arora. 2020. TextHide: Tackling Data Privacy in Language Understanding Tasks. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1368–1382, Online. Association for Computational Linguistics.
Cite (Informal):
TextHide: Tackling Data Privacy in Language Understanding Tasks (Huang et al., Findings 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.findings-emnlp.123.pdf
Video:
 https://slideslive.com/38939771
Code
 Hazelsuko07/TextHide
Data
CoLAGLUEMRPCMultiNLIQNLISSTSST-2