Meizhu Liu
2026
Synthetic Doctor-Patient Dialogue Generation for Robust Medical ASR: A Scalable Pipeline for Vocabulary Expansion and Privacy Preservation
Kefei Liu | Meizhu Liu
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
Kefei Liu | Meizhu Liu
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
Automatic Speech Recognition (ASR) is increasingly integral to healthcare services, where medical conversations present unique transcription challenges due to specialized terminology and frequent introduction of new terms. Existing ASR models, including widely used systems like Whisper, struggle with high word error rates (WER) on clinical vocabulary, especially medication names, primarily due to the scarcity of annotated audio-transcript data in the medical domain. This paper proposes and evaluates a novel synthetic data generation pipeline that produces comprehensive doctor-patient dialogues in both text and audio forms, specifically targeting a curated set of over 124,000 medical terms. The pipeline generated over 1 billion audios with ground truth transcriptions. Fine-tuning ASR models with this synthetic corpus significantly reduced overall WER and improved transcription accuracy on medical terms, marking a significant advance in healthcare ASR accuracy. Data generation code, dataset, and training and evaluation scripts are released.
No Label? No Problem: Unsupervised Continual Learning for Adaptive Medical ASR
Meizhu Liu | Tao Sheng
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
Meizhu Liu | Tao Sheng
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
Automatic Speech Recognition (ASR) plays an important role in healthcare but faces unique challenges. Medical audio often contains specialized terminology, such as medication names, which existing ASR systems struggle to transcribe accurately. High error rates arise from pronunciation variability, the continual introduction of new terms, and the scarcity of high-quality labeled data—whose collection is costly and requires medical expertise. Although synthetic datasets partially alleviate this problem, they fail to capture the noise and variability of real-world recordings. Moreover, ASR models trained in controlled environments are highly sensitive to noise, leading to degraded performance in clinical settings. To address these limitations, we propose an unsupervised continual learning ASR framework that adapts to new data while preserving prior knowledge. This enables efficient domain adaptation without extensive retraining. Experiments on real-world medical audio demonstrate significant improvements over state-of-the-art baselines.