A Benchmark of French ASR Systems Based on Error Severity

Antoine Tholly, Jane Wottawa, Mickael Rouvier, Richard Dufour


Abstract
Automatic Speech Recognition (ASR) transcription errors are commonly assessed using metrics that compare them with a reference transcription, such as Word Error Rate (WER), which measures spelling deviations from the reference, or semantic score-based metrics. However, these approaches often overlook what is understandable to humans when interpreting transcription errors. To address this limitation, a new evaluation is proposed that categorizes errors into four levels of severity, further divided into subtypes, based on objective linguistic criteria, contextual patterns, and the use of content words as the unit of analysis. This metric is applied to a benchmark of 10 state-of-the-art ASR systems on French language, encompassing both HMM-based and end-to-end models. Our findings reveal the strengths and weaknesses of each system, identifying those that provide the most comfortable reading experience for users.
Anthology ID:
2025.coling-main.341
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5094–5101
Language:
URL:
https://aclanthology.org/2025.coling-main.341/
DOI:
Bibkey:
Cite (ACL):
Antoine Tholly, Jane Wottawa, Mickael Rouvier, and Richard Dufour. 2025. A Benchmark of French ASR Systems Based on Error Severity. In Proceedings of the 31st International Conference on Computational Linguistics, pages 5094–5101, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
A Benchmark of French ASR Systems Based on Error Severity (Tholly et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.341.pdf