Cross-Lingual Speaker Identification for Indian Languages

Amaan Rizvi, Anupam Jamatia, Dwijen Rudrapal, Kunal Chakma, Björn Gambäck


Abstract
The paper introduces a cross-lingual speaker identification system for Indian languages, utilising a Long Short-Term Memory dense neural network (LSTM-DNN). The system was trained on audio recordings in English and evaluated on data from Hindi, Kannada, Malayalam, Tamil, and Telugu, with a view to how factors such as phonetic similarity and native accent affect performance. The model was fed with MFCC (mel-frequency cepstral coefficient) features extracted from the audio file. For comparison, the corresponding mel-spectrogram images were also used as input to a ResNet-50 model, while the raw audio was used to train a Siamese network. The LSTM-DNN model outperformed the other two models as well as two more traditional baseline speaker identification models, showing that deep learning models are superior to probabilistic models for capturing low-level speech features and learning speaker characteristics.
Anthology ID:
2023.ranlp-1.105
Volume:
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
979–987
Language:
URL:
https://aclanthology.org/2023.ranlp-1.105
DOI:
Bibkey:
Cite (ACL):
Amaan Rizvi, Anupam Jamatia, Dwijen Rudrapal, Kunal Chakma, and Björn Gambäck. 2023. Cross-Lingual Speaker Identification for Indian Languages. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 979–987, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Cross-Lingual Speaker Identification for Indian Languages (Rizvi et al., RANLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.ranlp-1.105.pdf