Multimodal Phased Transformer for Sentiment Analysis

Junyan Cheng, Iordanis Fostiropoulos, Barry Boehm, Mohammad Soleymani


Abstract
Multimodal Transformers achieve superior performance in multimodal learning tasks. However, the quadratic complexity of the self-attention mechanism in Transformers limits their deployment in low-resource devices and makes their inference and training computationally expensive. We propose multimodal Sparse Phased Transformer (SPT) to alleviate the problem of self-attention complexity and memory footprint. SPT uses a sampling function to generate a sparse attention matrix and compress a long sequence to a shorter sequence of hidden states. SPT concurrently captures interactions between the hidden states of different modalities at every layer. To further improve the efficiency of our method, we use Layer-wise parameter sharing and Factorized Co-Attention that share parameters between Cross Attention Blocks, with minimal impact on task performance. We evaluate our model with three sentiment analysis datasets and achieve comparable or superior performance compared with the existing methods, with a 90% reduction in the number of parameters. We conclude that (SPT) along with parameter sharing can capture multimodal interactions with reduced model size and improved sample efficiency.
Anthology ID:
2021.emnlp-main.189
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2447–2458
Language:
URL:
https://aclanthology.org/2021.emnlp-main.189
DOI:
10.18653/v1/2021.emnlp-main.189
Bibkey:
Cite (ACL):
Junyan Cheng, Iordanis Fostiropoulos, Barry Boehm, and Mohammad Soleymani. 2021. Multimodal Phased Transformer for Sentiment Analysis. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2447–2458, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Multimodal Phased Transformer for Sentiment Analysis (Cheng et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.189.pdf
Software:
 2021.emnlp-main.189.Software.zip
Video:
 https://aclanthology.org/2021.emnlp-main.189.mp4
Code
 chengjunyan1/sp-transformer
Data
Multimodal Opinionlevel Sentiment IntensityUR-FUNNY