Tyler Vuong


pdf bib
AdaBERT-CTC: Leveraging BERT-CTC for Text-Only Domain Adaptation in ASR
Tyler Vuong | Karel Mundnich | Dhanush Bekal | Veera Elluru | Srikanth Ronanki | Sravan Bodapati
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track

End-to-end (E2E) automatic speech recognition (ASR) models are becoming increasingly popular in commercial applications, such as virtual assistants, closed captioning, and dictation systems. The accuracy of the ASR is crucial to their success. However, E2E models still struggle to recognize out-of-domain words such as proper nouns and domain-specific terms. In this paper we introduce AdaBERT-CTC, a domain adaptation technique that relies solely on textual data. Our method allows for text-only adaptation by fine-tuning a pre-trained self-supervised text encoder model. Additionally, we show that our method can be made parameter-efficient by adding bottleneck adapters to the pre-trained model. This allows for adaptation with less than a 5% increase in parameters and minimal computational overhead during inference. We demonstrate that our approach outperforms the base BERT-CTC model by up to 14% relative word error rate improvement on several out-of-domain, publicly available datasets.