Rory Beard
2018
Multi-Modal Sequence Fusion via Recursive Attention for Emotion Recognition
Rory Beard
|
Ritwik Das
|
Raymond W. M. Ng
|
P. G. Keerthana Gopalakrishnan
|
Luka Eerens
|
Pawel Swietojanski
|
Ondrej Miksik
Proceedings of the 22nd Conference on Computational Natural Language Learning
Natural human communication is nuanced and inherently multi-modal. Humans possess specialised sensoria for processing vocal, visual, and linguistic, and para-linguistic information, but form an intricately fused percept of the multi-modal data stream to provide a holistic representation. Analysis of emotional content in face-to-face communication is a cognitive task to which humans are particularly attuned, given its sociological importance, and poses a difficult challenge for machine emulation due to the subtlety and expressive variability of cross-modal cues. Inspired by the empirical success of recent so-called End-To-End Memory Networks and related works, we propose an approach based on recursive multi-attention with a shared external memory updated over multiple gated iterations of analysis. We evaluate our model across several large multi-modal datasets and show that global contextualised memory with gated memory update can effectively achieve emotion recognition.
Search