RECAST: Retrieval-Augmented Contextual ASR via Decoder-State Keyword Spotting

Ashish Mittal; Sunita Sarawagi; Preethi Jyothi

doi:10.18653/v1/2025.findings-emnlp.203

RECAST: Retrieval-Augmented Contextual ASR via Decoder-State Keyword Spotting

Ashish Mittal, Sunita Sarawagi, Preethi Jyothi

Abstract

Contextual biasing in ASR systems is critical for recognizing rare, domain-specific terms but becomes impractical with large keyword dictionaries due to prompt size and latency constraints. We present RECAST–a lightweight retrieval-augmented approach that repurposes decoder states of a pretrained ASR model to retrieve relevant keywords without requiring audio exemplars. RECAST introduces a contrastively trained retriever that aligns decoder-state embeddings with textual keyword representations, enabling fast token-level retrieval over large dictionaries. Retrieved keywords are ranked and formatted into a prompt to guide a downstream speech language model. Trained solely on LibriSpeech and evaluated on out-of-domain benchmarks covering up to 4,000 keywords across diverse domains, RECAST consistently outperforms full-list prompt biasing and strong phonetic/text baselines. It achieves up to 54.3% relative reduction in entity WER and 41.3% overall WER improvement over the baseline, along with up to 2.5x higher recall in challenging settings. Furthermore, RECAST remains effective for diverse languages such as Hindi, demonstrating its scalability, language-agnostic design, and practicality for real-world contextual ASR.

Anthology ID:: 2025.findings-emnlp.203
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3780–3793
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.203/
DOI:: 10.18653/v1/2025.findings-emnlp.203
Bibkey:
Cite (ACL):: Ashish Mittal, Sunita Sarawagi, and Preethi Jyothi. 2025. RECAST: Retrieval-Augmented Contextual ASR via Decoder-State Keyword Spotting. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 3780–3793, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: RECAST: Retrieval-Augmented Contextual ASR via Decoder-State Keyword Spotting (Mittal et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.203.pdf
Checklist:: 2025.findings-emnlp.203.checklist.pdf

PDF Cite Search Checklist Fix data