Evaluating and Improving Automatic Speech Recognition using Severity

Ryan Whetten, Casey Kennington


Abstract
A common metric for evaluating Automatic Speech Recognition (ASR) is Word Error Rate (WER) which solely takes into account discrepancies at the word-level. Although useful, WER is not guaranteed to correlate well with human judgment or performance on downstream tasks that use ASR. Meaningful assessment of ASR mistakes becomes even more important in high-stake scenarios such as health-care. We propose 2 general measures to evaluate the severity of mistakes made by ASR systems, one based on sentiment analysis and another based on text embeddings. We evaluate these measures on simulated patient-doctor conversations using 5 ASR systems. Results show that these measures capture characteristics of ASR errors that WER does not. Furthermore, we train an ASR system incorporating severity and demonstrate the potential for using severity not only in the evaluation, but in the development of ASR. Advantages and limitations of this methodology are analyzed and discussed.
Anthology ID:
2023.bionlp-1.6
Volume:
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Dina Demner-fushman, Sophia Ananiadou, Kevin Cohen
Venue:
BioNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
79–91
Language:
URL:
https://aclanthology.org/2023.bionlp-1.6
DOI:
10.18653/v1/2023.bionlp-1.6
Bibkey:
Cite (ACL):
Ryan Whetten and Casey Kennington. 2023. Evaluating and Improving Automatic Speech Recognition using Severity. In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, pages 79–91, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Evaluating and Improving Automatic Speech Recognition using Severity (Whetten & Kennington, BioNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.bionlp-1.6.pdf