Open-Vocabulary Federated Learning with Multimodal Prototyping

Huimin Zeng, Zhenrui Yue, Dong Wang


Abstract
Existing federated learning (FL) studies usuallyassume the training label space and test labelspace are identical. However, in real-world applications, this assumption is too ideal to betrue. A new user could come up with queriesthat involve data from unseen classes, and suchopen-vocabulary queries would directly defectsuch FL systems. Therefore, in this work, weexplicitly focus on the under-explored openvocabulary challenge in FL. That is, for a newuser, the global server shall understand her/hisquery that involves arbitrary unknown classes.To address this problem, we leverage the pretrained vision-language models (VLMs). Inparticular, we present a novel adaptation framework tailored for VLMs in the context of FL,named as Federated Multimodal Prototyping(Fed-MP). Fed-MP adaptively aggregates thelocal model weights based on light-weightclient residuals, and makes predictions basedon a novel multimodal prototyping mechanism.Fed-MP exploits the knowledge learned fromthe seen classes, and robustifies the adaptedVLM to unseen categories. Our empirical evaluation on various datasets validates the effectiveness of Fed-MP.
Anthology ID:
2024.naacl-long.314
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5644–5656
Language:
URL:
https://aclanthology.org/2024.naacl-long.314
DOI:
10.18653/v1/2024.naacl-long.314
Bibkey:
Cite (ACL):
Huimin Zeng, Zhenrui Yue, and Dong Wang. 2024. Open-Vocabulary Federated Learning with Multimodal Prototyping. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 5644–5656, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Open-Vocabulary Federated Learning with Multimodal Prototyping (Zeng et al., NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-long.314.pdf