Distilling Multilingual Transformers into CNNs for Scalable Intent Classification

Besnik Fetahu, Akash Veeragouni, Oleg Rokhlenko, Shervin Malmasi


Abstract
We describe an application of Knowledge Distillation used to distill and deploy multilingual Transformer models for voice assistants, enabling text classification for customers globally. Transformers have set new state-of-the-art results for tasks like intent classification, and multilingual models exploit cross-lingual transfer to allow serving requests across 100+ languages. However, their prohibitive inference time makes them impractical to deploy in real-world scenarios with low latency requirements, such as is the case of voice assistants. We address the problem of cross-architecture distillation of multilingual Transformers to simpler models, while maintaining multilinguality without performance degradation. Training multilingual student models has received little attention, and is our main focus. We show that a teacher-student framework, where the teacher’s unscaled activations (logits) on unlabelled data are used to supervise student model training, enables distillation of Transformers into efficient multilingual CNN models. Our student model achieves equivalent performance as the teacher, and outperforms a similar model trained on the labelled data used to train the teacher model. This approach has enabled us to accurately serve global customer requests at speed (18x improvement), scale, and low cost.
Anthology ID:
2022.emnlp-industry.43
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
December
Year:
2022
Address:
Abu Dhabi, UAE
Editors:
Yunyao Li, Angeliki Lazaridou
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
429–439
Language:
URL:
https://aclanthology.org/2022.emnlp-industry.43
DOI:
10.18653/v1/2022.emnlp-industry.43
Bibkey:
Cite (ACL):
Besnik Fetahu, Akash Veeragouni, Oleg Rokhlenko, and Shervin Malmasi. 2022. Distilling Multilingual Transformers into CNNs for Scalable Intent Classification. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 429–439, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Distilling Multilingual Transformers into CNNs for Scalable Intent Classification (Fetahu et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-industry.43.pdf