A Few-Shot Multi-Accented Speech Classification for Indian Languages using Transformers and LLM’s Fine-Tuning Approaches

Jairam R, Jyothish G, Premjith B


Abstract
Accented speech classification plays a vital role in the advancement of high-quality automatic speech recognition (ASR) technology. For certain applications, like multi-accented speech classification, it is not always viable to obtain data with accent variation, especially for resource-poor languages. This is one of the major reasons that contributes to the underperformance of the speech classification systems. Therefore, in order to handle speech variability in Indian language speaker accents, we propose a few-shot learning paradigm in this study. It learns generic feature embeddings using an encoder from a pre-trained whisper model and a classification head for classification. The model is refined using LLM’s fine-tuning techniques, such as LoRA and QLoRA, for the six Indian English accents in the Indic Accent Dataset. The experimental findings show that the accuracy of the model is greatly increased by the few-shot learning paradigm’s effectiveness combined with LLM’s fine-tuning techniques. In optimal settings, the model’s accuracy can reach 94% when the trainable parameters are set to 5%.
Anthology ID:
2024.dravidianlangtech-1.1
Volume:
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Month:
March
Year:
2024
Address:
St. Julian's, Malta
Editors:
Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Sajeetha Thavareesan, Elizabeth Sherly, Rajeswari Nadarajan, Manikandan Ravikiran
Venues:
DravidianLangTech | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–9
Language:
URL:
https://aclanthology.org/2024.dravidianlangtech-1.1
DOI:
Bibkey:
Cite (ACL):
Jairam R, Jyothish G, and Premjith B. 2024. A Few-Shot Multi-Accented Speech Classification for Indian Languages using Transformers and LLM’s Fine-Tuning Approaches. In Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, pages 1–9, St. Julian's, Malta. Association for Computational Linguistics.
Cite (Informal):
A Few-Shot Multi-Accented Speech Classification for Indian Languages using Transformers and LLM’s Fine-Tuning Approaches (R et al., DravidianLangTech-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.dravidianlangtech-1.1.pdf