Bhuvana Ramabhadran

Also published as: B. Ramabhadran


2025

pdf bib
LegoSLM: Connecting LLM with Speech Encoder using CTC Posteriors
Rao Ma | Tongzhou Chen | Kartik Audhkhasi | Bhuvana Ramabhadran
Findings of the Association for Computational Linguistics: EMNLP 2025

Recently, large-scale pre-trained speech encoders and Large Language Models (LLMs) have been released, which show state-of-the-art performance on a range of spoken language processing tasks, including Automatic Speech Recognition (ASR). To effectively combine both models for better performance, continuous speech prompts and ASR error correction have been adopted. However, these methods are prone to suboptimal performance or are inflexible. In this paper, we propose a new paradigm, LegoSLM, that bridges speech encoders and LLMs using the ASR posterior matrices. The speech encoder is trained to generate Connectionist Temporal Classification (CTC) posteriors over the LLM vocabulary, which are used to reconstruct pseudo-audio embeddings by computing a weighted sum of the LLM input embeddings. These embeddings are concatenated with text embeddings in the LLM input space. Using the well-performing USM and Gemma models as an example, we demonstrate that our proposed LegoSLM method yields good performance on both ASR and speech translation tasks. By connecting USM with Gemma models, we can get an average of 49% WER reduction (WERR) over the USM-CTC baseline on 8 MLS testsets. The trained model also exhibits modularity in a range of settings – after fine-tuning the Gemma model weights, the speech encoder can be switched and combined with the LLM in a zero-shot fashion. Additionally, we propose to control the decode-time influence of the USM and LLM using a softmax temperature, which shows effectiveness in domain adaptation.

2012

pdf bib
Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
Bhuvana Ramabhadran | Sanjeev Khudanpur | Ebru Arisoy
Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT

pdf bib
Deep Neural Network Language Models
Ebru Arisoy | Tara N. Sainath | Brian Kingsbury | Bhuvana Ramabhadran
Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT

2010

pdf bib
Unsupervised Model Adaptation using Information-Theoretic Criterion
Ariya Rastrow | Frederick Jelinek | Abhinav Sethy | Bhuvana Ramabhadran
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2009

pdf bib
Fast decoding for open vocabulary spoken term detection
Bhuvana Ramabhadran | Abhinav Sethy | Jonathan Mamou | Brian Kingsbury | Upendra Chaudhari
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

2006

pdf bib
Automated Quality Monitoring for Call Centers using Speech and NLP Technologies
G. Zweig | O. Siohan | G. Saon | B. Ramabhadran | D. Povey | L. Mangu | B. Kingsbury
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Demonstrations