Benchmarking Whisper for Low-Resource Speech Recognition: An N-Shot Evaluation on Pashto, Punjabi, and Urdu

Najm Ul Sehar; Ayesha Khalid; Farah Adeeba; Sarmad Hussain

Benchmarking Whisper for Low-Resource Speech Recognition: An N-Shot Evaluation on Pashto, Punjabi, and Urdu

Najm Ul Sehar, Ayesha Khalid, Farah Adeeba, Sarmad Hussain

Abstract

Whisper, a large-scale multilingual model, has demonstrated strong performance in speech recognition benchmarks, but its effectiveness on low-resource languages remains under-explored. This paper evaluates Whisper’s performance on Pashto, Punjabi, and Urdu, three underrepresented languages. While Automatic Speech Recognition (ASR) has advanced for widely spoken languages, low-resource languages still face challenges due to limited data. Whisper’s zero-shot performance was benchmarked and then its small variant was fine-tuned to improve transcription accuracy. Significant reductions in Word Error Rate (WER) were achieved through few-shot fine-tuning, which helped the model better handle challenges such as complex phonetic structures, compared to zero-shot performance. This study contributes to improving multilingual ASR for low-resource languages and highlights Whisper’s adaptability and potential for further enhancement.

Anthology ID:: 2025.chipsal-1.20
Volume:: Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025)
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Kengatharaiyer Sarveswaran, Ashwini Vaidya, Bal Krishna Bal, Sana Shams, Surendrabikram Thapa
Venues:: CHiPSAL | WS
SIG:
Publisher:: International Committee on Computational Linguistics
Note:
Pages:: 202–207
Language:
URL:: https://aclanthology.org/2025.chipsal-1.20/
DOI:
Bibkey:
Cite (ACL):: Najm Ul Sehar, Ayesha Khalid, Farah Adeeba, and Sarmad Hussain. 2025. Benchmarking Whisper for Low-Resource Speech Recognition: An N-Shot Evaluation on Pashto, Punjabi, and Urdu. In Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025), pages 202–207, Abu Dhabi, UAE. International Committee on Computational Linguistics.
Cite (Informal):: Benchmarking Whisper for Low-Resource Speech Recognition: An N-Shot Evaluation on Pashto, Punjabi, and Urdu (Sehar et al., CHiPSAL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.chipsal-1.20.pdf

PDF Cite Search Fix data