@inproceedings{talafha-etal-2026-zero,
title = "Zero-Shot Context-Aware {ASR} for Diverse {A}rabic Varieties",
author = "Talafha, Bashar and
Alhassan, Amin Abu and
Abdul-Mageed, Muhammad",
editor = "Liakata, Maria and
Moreira, Viviane P. and
Zhang, Jiajun and
Jurgens, David",
booktitle = "Findings of the {A}ssociation for {C}omputational {L}inguistics: {ACL} 2026",
month = jul,
year = "2026",
address = "San Diego, California, United States",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2026.findings-acl.1296/",
pages = "26029--26044",
ISBN = "979-8-89176-395-1",
abstract = "Zero-shot ASR for Arabic remains challenging: while multilingual models perform well on Modern Standard Arabic (MSA), error rates rise sharply on dialectal and accented speech due to linguistic mismatch and scarce labeled data. We study $\textit{context-aware}$ decoding as a lightweight test-time adaptation paradigm that conditions inference on external side information without parameter updates. For promptable encoder{--}decoder ASR (e.g., Whisper), we incorporate context through (i) decoder prompting with first-pass hypotheses and (ii) encoder/decoder prefixing with retrieved speech-text exemplars, complemented by simple prompt reordering and optional speaker-matched synthetic exemplars to improve robustness in informal and multi-speaker settings. To extend contextual adaptation beyond promptable architectures, we introduce $\textit{proxy-guided $n$-best selection}$ for CTC ASR: given one or more external proxy hypotheses, we select from a model{'}s $n$-best list by minimizing text-level distance to the proxies, enabling contextual inference without direct prompting. Across ten Arabic conditions spanning MSA, accented MSA, and multiple dialects, the best-performing context-aware variants yield average relative WER reductions of 22.29{\%} on MSA, 20.54{\%} on accented MSA, and 9.15{\%} on dialectal Arabic. For CTC ASR on our Common Voice MSA testbed, proxy-guided selection reduces WER by 15.6{\%} relative and recovers a substantial fraction of oracle $n$-best gains, showing that external-context guidance can also benefit non-promptable ASR."
}<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="talafha-etal-2026-zero">
<titleInfo>
<title>Zero-Shot Context-Aware ASR for Diverse Arabic Varieties</title>
</titleInfo>
<name type="personal">
<namePart type="given">Bashar</namePart>
<namePart type="family">Talafha</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Amin</namePart>
<namePart type="given">Abu</namePart>
<namePart type="family">Alhassan</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Muhammad</namePart>
<namePart type="family">Abdul-Mageed</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<originInfo>
<dateIssued>2026-07</dateIssued>
</originInfo>
<typeOfResource>text</typeOfResource>
<relatedItem type="host">
<titleInfo>
<title>Findings of the Association for Computational Linguistics: ACL 2026</title>
</titleInfo>
<name type="personal">
<namePart type="given">Maria</namePart>
<namePart type="family">Liakata</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Viviane</namePart>
<namePart type="given">P</namePart>
<namePart type="family">Moreira</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Jiajun</namePart>
<namePart type="family">Zhang</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">David</namePart>
<namePart type="family">Jurgens</namePart>
<role>
<roleTerm authority="marcrelator" type="text">editor</roleTerm>
</role>
</name>
<originInfo>
<publisher>Association for Computational Linguistics</publisher>
<place>
<placeTerm type="text">San Diego, California, United States</placeTerm>
</place>
</originInfo>
<genre authority="marcgt">conference publication</genre>
<identifier type="isbn">979-8-89176-395-1</identifier>
</relatedItem>
<abstract>Zero-shot ASR for Arabic remains challenging: while multilingual models perform well on Modern Standard Arabic (MSA), error rates rise sharply on dialectal and accented speech due to linguistic mismatch and scarce labeled data. We study context-aware decoding as a lightweight test-time adaptation paradigm that conditions inference on external side information without parameter updates. For promptable encoder–decoder ASR (e.g., Whisper), we incorporate context through (i) decoder prompting with first-pass hypotheses and (ii) encoder/decoder prefixing with retrieved speech-text exemplars, complemented by simple prompt reordering and optional speaker-matched synthetic exemplars to improve robustness in informal and multi-speaker settings. To extend contextual adaptation beyond promptable architectures, we introduce proxy-guided n-best selection for CTC ASR: given one or more external proxy hypotheses, we select from a model’s n-best list by minimizing text-level distance to the proxies, enabling contextual inference without direct prompting. Across ten Arabic conditions spanning MSA, accented MSA, and multiple dialects, the best-performing context-aware variants yield average relative WER reductions of 22.29% on MSA, 20.54% on accented MSA, and 9.15% on dialectal Arabic. For CTC ASR on our Common Voice MSA testbed, proxy-guided selection reduces WER by 15.6% relative and recovers a substantial fraction of oracle n-best gains, showing that external-context guidance can also benefit non-promptable ASR.</abstract>
<identifier type="citekey">talafha-etal-2026-zero</identifier>
<location>
<url>https://aclanthology.org/2026.findings-acl.1296/</url>
</location>
<part>
<date>2026-07</date>
<extent unit="page">
<start>26029</start>
<end>26044</end>
</extent>
</part>
</mods>
</modsCollection>
%0 Conference Proceedings
%T Zero-Shot Context-Aware ASR for Diverse Arabic Varieties
%A Talafha, Bashar
%A Alhassan, Amin Abu
%A Abdul-Mageed, Muhammad
%Y Liakata, Maria
%Y Moreira, Viviane P.
%Y Zhang, Jiajun
%Y Jurgens, David
%S Findings of the Association for Computational Linguistics: ACL 2026
%D 2026
%8 July
%I Association for Computational Linguistics
%C San Diego, California, United States
%@ 979-8-89176-395-1
%F talafha-etal-2026-zero
%X Zero-shot ASR for Arabic remains challenging: while multilingual models perform well on Modern Standard Arabic (MSA), error rates rise sharply on dialectal and accented speech due to linguistic mismatch and scarce labeled data. We study context-aware decoding as a lightweight test-time adaptation paradigm that conditions inference on external side information without parameter updates. For promptable encoder–decoder ASR (e.g., Whisper), we incorporate context through (i) decoder prompting with first-pass hypotheses and (ii) encoder/decoder prefixing with retrieved speech-text exemplars, complemented by simple prompt reordering and optional speaker-matched synthetic exemplars to improve robustness in informal and multi-speaker settings. To extend contextual adaptation beyond promptable architectures, we introduce proxy-guided n-best selection for CTC ASR: given one or more external proxy hypotheses, we select from a model’s n-best list by minimizing text-level distance to the proxies, enabling contextual inference without direct prompting. Across ten Arabic conditions spanning MSA, accented MSA, and multiple dialects, the best-performing context-aware variants yield average relative WER reductions of 22.29% on MSA, 20.54% on accented MSA, and 9.15% on dialectal Arabic. For CTC ASR on our Common Voice MSA testbed, proxy-guided selection reduces WER by 15.6% relative and recovers a substantial fraction of oracle n-best gains, showing that external-context guidance can also benefit non-promptable ASR.
%U https://aclanthology.org/2026.findings-acl.1296/
%P 26029-26044
Markdown (Informal)
[Zero-Shot Context-Aware ASR for Diverse Arabic Varieties](https://aclanthology.org/2026.findings-acl.1296/) (Talafha et al., Findings 2026)
ACL
- Bashar Talafha, Amin Abu Alhassan, and Muhammad Abdul-Mageed. 2026. Zero-Shot Context-Aware ASR for Diverse Arabic Varieties. In Findings of the Association for Computational Linguistics: ACL 2026, pages 26029–26044, San Diego, California, United States. Association for Computational Linguistics.