RECAST: Retrieval-Augmented Contextual ASR via Decoder-State Keyword Spotting

Ashish Mittal, Sunita Sarawagi, Preethi Jyothi


Abstract
Contextual biasing in ASR systems is critical for recognizing rare, domain-specific terms but becomes impractical with large keyword dictionaries due to prompt size and latency constraints. We present RECAST–a lightweight retrieval-augmented approach that repurposes decoder states of a pretrained ASR model to retrieve relevant keywords without requiring audio exemplars. RECAST introduces a contrastively trained retriever that aligns decoder-state embeddings with textual keyword representations, enabling fast token-level retrieval over large dictionaries. Retrieved keywords are ranked and formatted into a prompt to guide a downstream speech language model. Trained solely on LibriSpeech and evaluated on out-of-domain benchmarks covering up to 4,000 keywords across diverse domains, RECAST consistently outperforms full-list prompt biasing and strong phonetic/text baselines. It achieves up to 54.3% relative reduction in entity WER and 41.3% overall WER improvement over the baseline, along with up to 2.5x higher recall in challenging settings. Furthermore, RECAST remains effective for diverse languages such as Hindi, demonstrating its scalability, language-agnostic design, and practicality for real-world contextual ASR.
Anthology ID:
2025.findings-emnlp.203
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3780–3793
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.203/
DOI:
Bibkey:
Cite (ACL):
Ashish Mittal, Sunita Sarawagi, and Preethi Jyothi. 2025. RECAST: Retrieval-Augmented Contextual ASR via Decoder-State Keyword Spotting. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 3780–3793, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
RECAST: Retrieval-Augmented Contextual ASR via Decoder-State Keyword Spotting (Mittal et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.203.pdf
Checklist:
 2025.findings-emnlp.203.checklist.pdf