Modality-specific Distillation

Woojeong Jin, Maziar Sanjabi, Shaoliang Nie, Liang Tan, Xiang Ren, Hamed Firooz


Abstract
Large neural networks are impractical to deploy on mobile devices due to their heavy computational cost and slow inference. Knowledge distillation (KD) is a technique to reduce the model size while retaining performance by transferring knowledge from a large “teacher” model to a smaller “student” model. However, KD on multimodal datasets such as vision-language datasets is relatively unexplored and digesting such multimodal information is challenging since different modalities present different types of information. In this paper, we propose modality-specific distillation (MSD) to effectively transfer knowledge from a teacher on multimodal datasets. Existing KD approaches can be applied to multimodal setup, but a student doesn’t have access to modality-specific predictions. Our idea aims at mimicking a teacher’s modality-specific predictions by introducing an auxiliary loss term for each modality. Because each modality has different importance for predictions, we also propose weighting approaches for the auxiliary losses; a meta-learning approach to learn the optimal weights on these loss terms. In our experiments, we demonstrate the effectiveness of our MSD and the weighting scheme and show that it achieves better performance than KD.
Anthology ID:
2021.maiworkshop-1.7
Volume:
Proceedings of the Third Workshop on Multimodal Artificial Intelligence
Month:
June
Year:
2021
Address:
Mexico City, Mexico
Venues:
NAACL | maiworkshop
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
42–53
Language:
URL:
https://aclanthology.org/2021.maiworkshop-1.7
DOI:
10.18653/v1/2021.maiworkshop-1.7
Bibkey:
Cite (ACL):
Woojeong Jin, Maziar Sanjabi, Shaoliang Nie, Liang Tan, Xiang Ren, and Hamed Firooz. 2021. Modality-specific Distillation. In Proceedings of the Third Workshop on Multimodal Artificial Intelligence, pages 42–53, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Modality-specific Distillation (Jin et al., maiworkshop 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.maiworkshop-1.7.pdf
Data
Hateful MemesSNLI-VE