Convolutional and Recurrent Neural Networks for Spoken Emotion Recognition
Aaron Keesing | Ian Watson | Michael Witbrock
Proceedings of the The 18th Annual Workshop of the Australasian Language Technology Association
We test four models proposed in the speech emotion recognition (SER) literature on 15 public and academic licensed datasets in speaker-independent cross-validation. Results indicate differences in the performance of the models which is partly dependent on the dataset and features used. We also show that a standard utterance-level feature set still performs competitively with neural models on some datasets. This work serves as a starting point for future model comparisons, in addition to open-sourcing the testing code.