Revisiting Data Reconstruction Attacks on Real-world Dataset for Federated Natural Language Understanding

Zhuo Zhang, Jintao Huang, Xiangjing Hu, Jingyuan Zhang, Yating Zhang, Hui Wang, Yue Yu, Qifan Wang, Lizhen Qu, Zenglin Xu


Abstract
With the growing privacy concerns surrounding natural language understanding (NLU) applications, the need to train high-quality models while safeguarding data privacy has reached unprecedented importance. Federated learning (FL) offers a promising approach to collaborative model training by exchanging model gradients. However, many studies show that eavesdroppers in FL could develop sophisticated data reconstruction attack (DRA) to accurately reconstruct clients’ data from the shared gradients. Regrettably, current DRA methods in federated NLU have been mostly conducted on public datasets, lacking a comprehensive evaluation of real-world privacy datasets. To address this limitation, this paper presents a pioneering study that reexamines the performance of these DRA methods as well as corresponding defense methods. Specifically, we introduce a novel real-world privacy dataset called FedAttack which leads to a significant discovery: existing DRA methods usually fail to accurately recover the original text of real-world privacy data. In detail, the tokens within a recovery sentence are disordered and intertwined with tokens from other sentences in the same training batch. Moreover, our experiments demonstrate that the performance of DRA is also influenced by different languages and domains. By discovering these findings, our work lays a solid foundation for further research into the development of more practical DRA methods and corresponding defenses.
Anthology ID:
2024.lrec-main.1227
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
14080–14091
Language:
URL:
https://aclanthology.org/2024.lrec-main.1227
DOI:
Bibkey:
Cite (ACL):
Zhuo Zhang, Jintao Huang, Xiangjing Hu, Jingyuan Zhang, Yating Zhang, Hui Wang, Yue Yu, Qifan Wang, Lizhen Qu, and Zenglin Xu. 2024. Revisiting Data Reconstruction Attacks on Real-world Dataset for Federated Natural Language Understanding. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 14080–14091, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Revisiting Data Reconstruction Attacks on Real-world Dataset for Federated Natural Language Understanding (Zhang et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1227.pdf