Jintao Huang
2024
Revisiting Data Reconstruction Attacks on Real-world Dataset for Federated Natural Language Understanding
Zhuo Zhang
|
Jintao Huang
|
Xiangjing Hu
|
Jingyuan Zhang
|
Yating Zhang
|
Hui Wang
|
Yue Yu
|
Qifan Wang
|
Lizhen Qu
|
Zenglin Xu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
With the growing privacy concerns surrounding natural language understanding (NLU) applications, the need to train high-quality models while safeguarding data privacy has reached unprecedented importance. Federated learning (FL) offers a promising approach to collaborative model training by exchanging model gradients. However, many studies show that eavesdroppers in FL could develop sophisticated data reconstruction attack (DRA) to accurately reconstruct clients’ data from the shared gradients. Regrettably, current DRA methods in federated NLU have been mostly conducted on public datasets, lacking a comprehensive evaluation of real-world privacy datasets. To address this limitation, this paper presents a pioneering study that reexamines the performance of these DRA methods as well as corresponding defense methods. Specifically, we introduce a novel real-world privacy dataset called FedAttack which leads to a significant discovery: existing DRA methods usually fail to accurately recover the original text of real-world privacy data. In detail, the tokens within a recovery sentence are disordered and intertwined with tokens from other sentences in the same training batch. Moreover, our experiments demonstrate that the performance of DRA is also influenced by different languages and domains. By discovering these findings, our work lays a solid foundation for further research into the development of more practical DRA methods and corresponding defenses.