WER is Unaware: Assessing How ASR Errors Distort Clinical Understanding in Patient Facing Dialogue

Zachary Ellis, Jared Joselowitz, Yash Deo, Yajie Vera He, Anna Kalygina, Aisling Higham, Mana Rahimzadeh, Yan Jia, Ibrahim Habli, Ernest Lim


Abstract
As Automatic Speech Recognition (ASR) is increasingly deployed in clinical dialogue, standard evaluations still rely heavily on Word Error Rate (WER). This paper challenges that standard, investigating whether WER or other common metrics correlate with the clinical impact of transcription errors. We establish a gold-standard benchmark by having expert clinicians compare ground-truth utterances to their ASR-generated counterparts, labeling the clinical impact of any discrepancies found in two distinct doctor-patient dialogue datasets. Our analysis reveals that WER and a comprehensive suite of existing metrics correlate poorly with the clinician-assigned risk labels (No, Minimal, or Significant Impact). To bridge this evaluation gap, we introduce an LLM-as-a-Judge, programmatically optimized using GEPA to replicate expert clinical assessment. The optimized judge (Gemini-2.5-Pro) achieves human-comparable performance, obtaining 90% accuracy and a strong Cohen’s kappa of 0.816. This work provides a validated, automated framework for moving ASR evaluation beyond simple textual fidelity to a necessary, scalable assessment of safety in clinical dialogue.
Anthology ID:
2026.iwsds-1.39
Volume:
Proceedings of the 16th International Workshop on Spoken Dialogue System Technology
Month:
February
Year:
2026
Address:
Trento, Italy
Editors:
Giuseppe Riccardi, Seyed Mahed Mousavi, Maria Ines Torres, Koichiro Yoshino, Zoraida Callejas, Shammur Absar Chowdhury, Yun-Nung Chen, Frederic Bechet, Joakim Gustafson, Géraldine Damnati, Alex Papangelis, Luis Fernando D’Haro, John Mendonça, Raffaella Bernardi, Dilek Hakkani-Tur, Giuseppe "Pino" Di Fabbrizio, Tatsuya Kawahara, Firoj Alam, Gokhan Tur, Michael Johnston
Venue:
IWSDS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
391–417
Language:
URL:
https://aclanthology.org/2026.iwsds-1.39/
DOI:
Bibkey:
Cite (ACL):
Zachary Ellis, Jared Joselowitz, Yash Deo, Yajie Vera He, Anna Kalygina, Aisling Higham, Mana Rahimzadeh, Yan Jia, Ibrahim Habli, and Ernest Lim. 2026. WER is Unaware: Assessing How ASR Errors Distort Clinical Understanding in Patient Facing Dialogue. In Proceedings of the 16th International Workshop on Spoken Dialogue System Technology, pages 391–417, Trento, Italy. Association for Computational Linguistics.
Cite (Informal):
WER is Unaware: Assessing How ASR Errors Distort Clinical Understanding in Patient Facing Dialogue (Ellis et al., IWSDS 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.iwsds-1.39.pdf