To Distill or Not to Distill? On the Robustness of Robust Knowledge Distillation

Abdul Waheed, Karima Kadaoui, Muhammad Abdul-Mageed


Abstract
Arabic is known to present unique challengesfor Automatic Speech Recognition (ASR). Onone hand, its rich linguistic diversity andwide range of dialects complicate the de-velopment of robust, inclusive models. Onthe other, current multilingual ASR modelsare compute-intensive and lack proper com-prehensive evaluations. In light of thesechallenges, we distill knowledge from largeteacher models into smaller student variantsthat more efficient. We also introduce a novelhuman-annotated dataset covering five under-represented Arabic dialects for evaluation. Wefurther evaluate both our models and existingSoTA multilingual models on both standardavailable benchmarks and our new dialectaldata. Our best-distilled model’s overall perfor-mance (45.0% WER) surpasses that of a SoTAmodel twice its size (SeamlessM4T-large-v2,WER=47.0%) and its teacher model (Whisper-large-v2, WER=55.1%), and its average perfor-mance on our new dialectal data (56.9% WER)outperforms all other models. To gain more in-sight into the poor performance of these modelson dialectal data, we conduct an error analysisand report the main types of errors the differentmodels tend to make. The GitHub repositoryfor the project is available at https://github.com/UBC-NLP/distill-whisper-ar.
Anthology ID:
2024.acl-long.680
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12603–12621
Language:
URL:
https://aclanthology.org/2024.acl-long.680
DOI:
Bibkey:
Cite (ACL):
Abdul Waheed, Karima Kadaoui, and Muhammad Abdul-Mageed. 2024. To Distill or Not to Distill? On the Robustness of Robust Knowledge Distillation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12603–12621, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
To Distill or Not to Distill? On the Robustness of Robust Knowledge Distillation (Waheed et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-long.680.pdf