Convolutional and Recurrent Neural Networks for Spoken Emotion Recognition

Aaron Keesing, Ian Watson, Michael Witbrock


Abstract
We test four models proposed in the speech emotion recognition (SER) literature on 15 public and academic licensed datasets in speaker-independent cross-validation. Results indicate differences in the performance of the models which is partly dependent on the dataset and features used. We also show that a standard utterance-level feature set still performs competitively with neural models on some datasets. This work serves as a starting point for future model comparisons, in addition to open-sourcing the testing code.
Anthology ID:
2020.alta-1.13
Volume:
Proceedings of the 18th Annual Workshop of the Australasian Language Technology Association
Month:
December
Year:
2020
Address:
Virtual Workshop
Editors:
Maria Kim, Daniel Beck, Meladel Mistica
Venue:
ALTA
SIG:
Publisher:
Australasian Language Technology Association
Note:
Pages:
104–109
Language:
URL:
https://aclanthology.org/2020.alta-1.13
DOI:
Bibkey:
Cite (ACL):
Aaron Keesing, Ian Watson, and Michael Witbrock. 2020. Convolutional and Recurrent Neural Networks for Spoken Emotion Recognition. In Proceedings of the 18th Annual Workshop of the Australasian Language Technology Association, pages 104–109, Virtual Workshop. Australasian Language Technology Association.
Cite (Informal):
Convolutional and Recurrent Neural Networks for Spoken Emotion Recognition (Keesing et al., ALTA 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.alta-1.13.pdf
Data
IEMOCAPMSP-IMPROVShEMO