An Exploration of Mamba for Speech Self-Supervised Models

Tzu-Quan Lin; Heng-Cheng Kuo; Tzu-Chieh Wei; Hsi-Chun Cheng; Chun Wei Chen; Hsien-Fu Hsiao; Yu Tsao; Hung-yi Lee

An Exploration of Mamba for Speech Self-Supervised Models

Tzu-Quan Lin, Heng-Cheng Kuo, Tzu-Chieh Wei, Hsi-Chun Cheng, Chun Wei Chen, Hsien-Fu Hsiao, Yu Tsao, Hung-yi Lee

Abstract

While Mamba has demonstrated strong performance in language modeling, its potential as a speech self-supervised learning (SSL) model remains underexplored, with prior studies limited to isolated tasks. To address this, we explore Mamba-based HuBERT models as alternatives to Transformer-based SSL architectures. Leveraging the linear-time Selective State Space, these models enable fine-tuning on long-context ASR with significantly lower compute. Moreover, they show superior performance when fine-tuned for streaming ASR. Beyond fine-tuning, these models show competitive performance on SUPERB probing benchmarks, particularly in causal settings. Our analysis shows that they yield higher-quality quantized representations and capture speaker-related features more distinctly than Transformer-based models. These findings highlight Mamba-based SSL as a promising and complementary direction for long-sequence modeling, real-time speech modeling, and speech unit extraction. The codebase is available at https://github.com/hckuo145/Mamba-based-HuBERT.

Anthology ID:: 2026.acl-long.470
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10337–10350
Language:
URL:: https://aclanthology.org/2026.acl-long.470/
DOI:
Bibkey:
Cite (ACL):: Tzu-Quan Lin, Heng-Cheng Kuo, Tzu-Chieh Wei, Hsi-Chun Cheng, Chun Wei Chen, Hsien-Fu Hsiao, Yu Tsao, and Hung-yi Lee. 2026. An Exploration of Mamba for Speech Self-Supervised Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10337–10350, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: An Exploration of Mamba for Speech Self-Supervised Models (Lin et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.470.pdf
Checklist:: 2026.acl-long.470.checklist.pdf

PDF Cite Search Checklist Fix data