A ResNet-50-Based Convolutional Neural Network Model for Language ID Identification from Speech Recordings

Giuseppe G. A. Celano


Abstract
This paper describes the model built for the SIGTYP 2021 Shared Task aimed at identifying 18 typologically different languages from speech recordings. Mel-frequency cepstral coefficients derived from audio files are transformed into spectrograms, which are then fed into a ResNet-50-based CNN architecture. The final model achieved validation and test accuracies of 0.73 and 0.53, respectively.
Anthology ID:
2021.sigtyp-1.13
Volume:
Proceedings of the Third Workshop on Computational Typology and Multilingual NLP
Month:
June
Year:
2021
Address:
Online
Editors:
Ekaterina Vylomova, Elizabeth Salesky, Sabrina Mielke, Gabriella Lapesa, Ritesh Kumar, Harald Hammarström, Ivan Vulić, Anna Korhonen, Roi Reichart, Edoardo Maria Ponti, Ryan Cotterell
Venue:
SIGTYP
SIG:
SIGTYP
Publisher:
Association for Computational Linguistics
Note:
Pages:
136–144
Language:
URL:
https://aclanthology.org/2021.sigtyp-1.13
DOI:
10.18653/v1/2021.sigtyp-1.13
Bibkey:
Cite (ACL):
Giuseppe G. A. Celano. 2021. A ResNet-50-Based Convolutional Neural Network Model for Language ID Identification from Speech Recordings. In Proceedings of the Third Workshop on Computational Typology and Multilingual NLP, pages 136–144, Online. Association for Computational Linguistics.
Cite (Informal):
A ResNet-50-Based Convolutional Neural Network Model for Language ID Identification from Speech Recordings (Celano, SIGTYP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.sigtyp-1.13.pdf