Multimodal Language Analysis with Recurrent Multistage Fusion

Paul Pu Liang, Ziyin Liu, AmirAli Bagher Zadeh, Louis-Philippe Morency


Abstract
Computational modeling of human multimodal language is an emerging research area in natural language processing spanning the language, visual and acoustic modalities. Comprehending multimodal language requires modeling not only the interactions within each modality (intra-modal interactions) but more importantly the interactions between modalities (cross-modal interactions). In this paper, we propose the Recurrent Multistage Fusion Network (RMFN) which decomposes the fusion problem into multiple stages, each of them focused on a subset of multimodal signals for specialized, effective fusion. Cross-modal interactions are modeled using this multistage fusion approach which builds upon intermediate representations of previous stages. Temporal and intra-modal interactions are modeled by integrating our proposed fusion approach with a system of recurrent neural networks. The RMFN displays state-of-the-art performance in modeling human multimodal language across three public datasets relating to multimodal sentiment analysis, emotion recognition, and speaker traits recognition. We provide visualizations to show that each stage of fusion focuses on a different subset of multimodal signals, learning increasingly discriminative multimodal representations.
Anthology ID:
D18-1014
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
150–161
Language:
URL:
https://aclanthology.org/D18-1014
DOI:
10.18653/v1/D18-1014
Bibkey:
Cite (ACL):
Paul Pu Liang, Ziyin Liu, AmirAli Bagher Zadeh, and Louis-Philippe Morency. 2018. Multimodal Language Analysis with Recurrent Multistage Fusion. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 150–161, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Multimodal Language Analysis with Recurrent Multistage Fusion (Liang et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1014.pdf
Attachment:
 D18-1014.Attachment.zip
Video:
 https://vimeo.com/305210831
Data
IEMOCAP