Dual Low-Rank Multimodal Fusion

Tao Jin, Siyu Huang, Yingming Li, Zhongfei Zhang


Abstract
Tensor-based fusion methods have been proven effective in multimodal fusion tasks. However, existing tensor-based methods make a poor use of the fine-grained temporal dynamics of multimodal sequential features. Motivated by this observation, this paper proposes a novel multimodal fusion method called Fine-Grained Temporal Low-Rank Multimodal Fusion (FT-LMF). FT-LMF correlates the features of individual time steps between multiple modalities, while it involves multiplications of high-order tensors in its calculation. This paper further proposes Dual Low-Rank Multimodal Fusion (Dual-LMF) to reduce the computational complexity of FT-LMF through low-rank tensor approximation along dual dimensions of input features. Dual-LMF is conceptually simple and practically effective and efficient. Empirical studies on benchmark multimodal analysis tasks show that our proposed methods outperform the state-of-the-art tensor-based fusion methods with a similar computational complexity.
Anthology ID:
2020.findings-emnlp.35
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2020
Month:
November
Year:
2020
Address:
Online
Editors:
Trevor Cohn, Yulan He, Yang Liu
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
377–387
Language:
URL:
https://aclanthology.org/2020.findings-emnlp.35
DOI:
10.18653/v1/2020.findings-emnlp.35
Bibkey:
Cite (ACL):
Tao Jin, Siyu Huang, Yingming Li, and Zhongfei Zhang. 2020. Dual Low-Rank Multimodal Fusion. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 377–387, Online. Association for Computational Linguistics.
Cite (Informal):
Dual Low-Rank Multimodal Fusion (Jin et al., Findings 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.findings-emnlp.35.pdf
Optional supplementary material:
 2020.findings-emnlp.35.OptionalSupplementaryMaterial.zip