On the Nature of Discrete Speech Representations in Multilingual Self-supervised Models

Badr M. Abdullah, Mohammed Maqsood Shaik, Dietrich Klakow


Abstract
Self-supervision has emerged as an effective paradigm for learning representations of spoken language from raw audio without explicit labels or transcriptions. Self-supervised speech models, such as wav2vec 2.0 (Baevski et al., 2020) and HuBERT (Hsu et al., 2021), have shown significant promise in improving the performance across different speech processing tasks. One of the main advantages of self-supervised speech models is that they can be pre-trained on a large sample of languages (Conneau et al., 2020; Babu et al.,2022), which facilitates cross-lingual transfer for low-resource languages (San et al., 2021). State-of-the-art self-supervised speech models include a quantization module that transforms the continuous acoustic input into a sequence of discrete units. One of the key questions in this area is whether the discrete representations learned via self-supervision are language-specific or language-universal. In other words, we ask: do the discrete units learned by a multilingual speech model represent the same speech sounds across languages or do they differ based on the specific language being spoken? From the practical perspective, this question has important implications for the development of speech models that can generalize across languages, particularly for low-resource languages. Furthermore, examining the level of linguistic abstraction in speech models that lack symbolic supervision is also relevant to the field of human language acquisition (Dupoux, 2018).
Anthology ID:
2023.sigtyp-1.20
Volume:
Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Lisa Beinborn, Koustava Goswami, Saliha Muradoğlu, Alexey Sorokin, Ritesh Kumar, Andreas Shcherbakov, Edoardo M. Ponti, Ryan Cotterell, Ekaterina Vylomova
Venue:
SIGTYP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
159–161
Language:
URL:
https://aclanthology.org/2023.sigtyp-1.20
DOI:
10.18653/v1/2023.sigtyp-1.20
Bibkey:
Cite (ACL):
Badr M. Abdullah, Mohammed Maqsood Shaik, and Dietrich Klakow. 2023. On the Nature of Discrete Speech Representations in Multilingual Self-supervised Models. In Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 159–161, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
On the Nature of Discrete Speech Representations in Multilingual Self-supervised Models (Abdullah et al., SIGTYP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.sigtyp-1.20.pdf