SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation

Mahi Luthra; Jiayi Shen; Maxime Poli; Angelo Ortiz Tandazo; Yosuke Higuchi; Youssef Benchekroun; Martin Gleize; Charles-Éric Saint-James; Dongyan Lin; Phillip Rust; Angel Villar-Corrales; Surya; Vanessa Stark; Rashel Moritz; Juan Pino; Yann Lecun; Emmanuel Dupoux

SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation

Mahi Luthra, Jiayi Shen, Maxime Poli, Angelo Ortiz Tandazo, Yosuke Higuchi, Youssef Benchekroun, Martin Gleize, Charles-Éric Saint-James, Dongyan Lin, Phillip Rust, Angel Villar-Corrales, Surya, Vanessa Stark, Rashel Moritz, Juan Pino, Yann LeCun, Emmanuel Dupoux

Abstract

Human infants, with only a few hundred hours of speech exposure, acquire basic units of new languages, highlighting a striking efficiency gap compared to the data-hungry self-supervised speech models. To address this gap, this paper introduces SpidR-Adapt for rapid adaptation of speech units to new languages using minimal unlabeled data. We cast such low-resource speech representation learning as a meta-learning problem and construct a multi-task adaptive pre-training (MAdaPT) protocol which formulates the adaptation process as a bi-level optimization framework. To enable scalable meta-training under this framework, we propose a novel heuristic solution, first-order bi-level optimization (FOBLO), avoiding heavy computation costs. Finally, we stabilize meta-training by using a robust initialization through interleaved supervision which alternates self-supervised and supervised objectives. Empirically, SpidR-Adapt achieves rapid gains in phonemic discriminability (ABX) and downstream spoken language modeling scores (sWUGGY, sBLIMP, tSC), surpassing in-domain toplines after training on less than 1h of target-language audio and delivering 100× greater data efficiency than standard multi-task training.. These findings highlight a practical, architecture-agnostic path toward biologically inspired, data-efficient representations. We open-source the training code and model checkpoints at https://github.com/facebookresearch/spidr-adapt.

Anthology ID:: 2026.acl-long.1325
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 28705–28728
Language:
URL:: https://aclanthology.org/2026.acl-long.1325/
DOI:
Bibkey:
Cite (ACL):: Mahi Luthra, Jiayi Shen, Maxime Poli, Angelo Ortiz Tandazo, Yosuke Higuchi, Youssef Benchekroun, Martin Gleize, Charles-Éric Saint-James, Dongyan Lin, Phillip Rust, Angel Villar-Corrales, Surya, Vanessa Stark, Rashel Moritz, Juan Pino, Yann LeCun, and Emmanuel Dupoux. 2026. SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 28705–28728, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation (Luthra et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.1325.pdf
Checklist:: 2026.acl-long.1325.checklist.pdf

PDF Cite Search Checklist Fix data