FedID: Federated Interactive Distillation for Large-Scale Pretraining Language Models

Xinge Ma; Jiangming Liu; Jin Wang; Xuejie Zhang

doi:10.18653/v1/2023.emnlp-main.529

FedID: Federated Interactive Distillation for Large-Scale Pretraining Language Models

Xinge Ma, Jiangming Liu, Jin Wang, Xuejie Zhang

Abstract

The growing concerns and regulations surrounding the protection of user data privacy have necessitated decentralized training paradigms. To this end, federated learning (FL) is widely studied in user-related natural language processing (NLP). However, it suffers from several critical limitations including extensive communication overhead, inability to handle heterogeneity, and vulnerability to white-box inference attacks. Federated distillation (FD) is proposed to alleviate these limitations, but its performance is faded by confirmation bias. To tackle this issue, we propose Federated Interactive Distillation (FedID), which utilizes a small amount of labeled data retained by the server to further rectify the local models during knowledge transfer. Additionally, based on the GLUE benchmark, we develop a benchmarking framework across multiple tasks with diverse data distributions to contribute to the research of FD in NLP community. Experiments show that our proposed FedID framework achieves the best results in homogeneous and heterogeneous federated scenarios. The code for this paper is available at: https://github.com/maxinge8698/FedID.

Anthology ID:: 2023.emnlp-main.529
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8566–8577
Language:
URL:: https://aclanthology.org/2023.emnlp-main.529
DOI:: 10.18653/v1/2023.emnlp-main.529
Bibkey:
Cite (ACL):: Xinge Ma, Jiangming Liu, Jin Wang, and Xuejie Zhang. 2023. FedID: Federated Interactive Distillation for Large-Scale Pretraining Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 8566–8577, Singapore. Association for Computational Linguistics.
Cite (Informal):: FedID: Federated Interactive Distillation for Large-Scale Pretraining Language Models (Ma et al., EMNLP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.emnlp-main.529.pdf
Video:: https://aclanthology.org/2023.emnlp-main.529.mp4

PDF Cite Search Video