Evaluating Open-Source ASR Systems: Performance Across Diverse Audio Conditions and Error Correction Methods

Saki Imai, Tahiya Chowdhury, Amanda J. Stent


Abstract
Despite significant advances in automatic speech recognition (ASR) accuracy, challenges remain. Naturally occurring conversation often involves multiple overlapping speakers, of different ages, accents and genders, as well as noisy environments and suboptimal audio recording equipment, all of which reduce ASR accuracy. In this study, we evaluate the accuracy of state of the art open source ASR systems across diverse conversational speech datasets, examining the impact of audio and speaker characteristics on WER. We then explore the potential of ASR ensembling and post-ASR correction methods to improve transcription accuracy. Our findings emphasize the need for robust error correction techniques and of continuing to address demographic biases to enhance ASR performance and inclusivity.
Anthology ID:
2025.coling-main.336
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5027–5039
Language:
URL:
https://aclanthology.org/2025.coling-main.336/
DOI:
Bibkey:
Cite (ACL):
Saki Imai, Tahiya Chowdhury, and Amanda J. Stent. 2025. Evaluating Open-Source ASR Systems: Performance Across Diverse Audio Conditions and Error Correction Methods. In Proceedings of the 31st International Conference on Computational Linguistics, pages 5027–5039, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Evaluating Open-Source ASR Systems: Performance Across Diverse Audio Conditions and Error Correction Methods (Imai et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.336.pdf