Language ID Prediction from Speech Using Self-Attentive Pooling

Roman Bedyakin, Nikolay Mikhaylovskiy


Abstract
This memo describes NTR-TSU submission for SIGTYP 2021 Shared Task on predicting language IDs from speech. Spoken Language Identification (LID) is an important step in a multilingual Automated Speech Recognition (ASR) system pipeline. For many low-resource and endangered languages, only single-speaker recordings may be available, demanding a need for domain and speaker-invariant language ID systems. In this memo, we show that a convolutional neural network with a Self-Attentive Pooling layer shows promising results for the language identification task.
Anthology ID:
2021.sigtyp-1.12
Volume:
Proceedings of the Third Workshop on Computational Typology and Multilingual NLP
Month:
June
Year:
2021
Address:
Online
Venues:
NAACL | SIGTYP
SIG:
SIGTYP
Publisher:
Association for Computational Linguistics
Note:
Pages:
130–135
Language:
URL:
https://aclanthology.org/2021.sigtyp-1.12
DOI:
10.18653/v1/2021.sigtyp-1.12
Bibkey:
Cite (ACL):
Roman Bedyakin and Nikolay Mikhaylovskiy. 2021. Language ID Prediction from Speech Using Self-Attentive Pooling. In Proceedings of the Third Workshop on Computational Typology and Multilingual NLP, pages 130–135, Online. Association for Computational Linguistics.
Cite (Informal):
Language ID Prediction from Speech Using Self-Attentive Pooling (Bedyakin & Mikhaylovskiy, SIGTYP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.sigtyp-1.12.pdf