Amaan Rizvi
2023
Cross-Lingual Speaker Identification for Indian Languages
Amaan Rizvi
|
Anupam Jamatia
|
Dwijen Rudrapal
|
Kunal Chakma
|
Björn Gambäck
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
The paper introduces a cross-lingual speaker identification system for Indian languages, utilising a Long Short-Term Memory dense neural network (LSTM-DNN). The system was trained on audio recordings in English and evaluated on data from Hindi, Kannada, Malayalam, Tamil, and Telugu, with a view to how factors such as phonetic similarity and native accent affect performance. The model was fed with MFCC (mel-frequency cepstral coefficient) features extracted from the audio file. For comparison, the corresponding mel-spectrogram images were also used as input to a ResNet-50 model, while the raw audio was used to train a Siamese network. The LSTM-DNN model outperformed the other two models as well as two more traditional baseline speaker identification models, showing that deep learning models are superior to probabilistic models for capturing low-level speech features and learning speaker characteristics.