Linear Script Representations in Speech Foundation Models Enable Zero-Shot Transliteration

Ryan Soh-Eun Shim; Kwanghee Choi; Kalvin Chang; Ming-Hao Hsu; Florian Eichin; Zhizheng Wu; Alane Suhr; Michael A. Hedderich; David Harwath; David R. Mortensen; Barbara Plank

Linear Script Representations in Speech Foundation Models Enable Zero-Shot Transliteration

Ryan Soh-Eun Shim, Kwanghee Choi, Kalvin Chang, Ming-Hao Hsu, Florian Eichin, Zhizheng Wu, Alane Suhr, Michael A. Hedderich, David Harwath, David R. Mortensen, Barbara Plank

Abstract

Multilingual speech foundation models such as Whisper are trained on web-scale data, where data for each language consists of a myriad of regional varieties. However, different regional varieties often employ different scripts to write the same language, rendering speech recognition output also subject to non-determinism in the output script. To mitigate this problem, we show that script is linearly encoded in the activation space of multilingual speech models, and that modifying activations at inference time enables direct control over output script. We find the addition of such script vectors to activations at test time can induce script change even in unconventional language-script pairings (e.g. Italian in Cyrillic and Japanese in Latin script). We apply this approach to inducing post-hoc control over the script of speech recognition output, where we observe competitive performance across all model sizes of Whisper.

Anthology ID:: 2026.findings-acl.464
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9532–9544
Language:
URL:: https://aclanthology.org/2026.findings-acl.464/
DOI:
Bibkey:
Cite (ACL):: Ryan Soh-Eun Shim, Kwanghee Choi, Kalvin Chang, Ming-Hao Hsu, Florian Eichin, Zhizheng Wu, Alane Suhr, Michael A. Hedderich, David Harwath, David R. Mortensen, and Barbara Plank. 2026. Linear Script Representations in Speech Foundation Models Enable Zero-Shot Transliteration. In Findings of the Association for Computational Linguistics: ACL 2026, pages 9532–9544, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Linear Script Representations in Speech Foundation Models Enable Zero-Shot Transliteration (Shim et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.464.pdf
Checklist:: 2026.findings-acl.464.checklist.pdf

PDF Cite Search Checklist Fix data