Multimodal Weighted Fusion of Transformers for Movie Genre Classification

Isaac Rodríguez Bribiesca; Adrián Pastor López Monroy; Manuel Montes

doi:10.18653/v1/2021.maiworkshop-1.1

Multimodal Weighted Fusion of Transformers for Movie Genre Classification

Isaac Rodríguez Bribiesca, Adrián Pastor López Monroy, Manuel Montes-y-Gómez

Abstract

The Multimodal Transformer showed to be a competitive model for multimodal tasks involving textual, visual and audio signals. However, as more modalities are involved, its late fusion by concatenation starts to have a negative impact on the model’s performance. Besides, interpreting model’s predictions becomes difficult, as one would have to look at the different attention activation matrices. In order to overcome these shortcomings, we propose to perform late fusion by adding a GMU module, which effectively allows the model to weight modalities at instance level, improving its performance while providing a better interpretabilty mechanism. In the experiments, we compare our proposed model (MulT-GMU) against the original implementation (MulT-Concat) and a SOTA model tested in a movie genre classification dataset. Our approach, MulT-GMU, outperforms both, MulT-Concat and previous SOTA model.

Anthology ID:: 2021.maiworkshop-1.1
Volume:: Proceedings of the Third Workshop on Multimodal Artificial Intelligence
Month:: June
Year:: 2021
Address:: Mexico City, Mexico
Editors:: Amir Zadeh, Louis-Philippe Morency, Paul Pu Liang, Candace Ross, Ruslan Salakhutdinov, Soujanya Poria, Erik Cambria, Kelly Shi
Venue:: maiworkshop
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1–5
Language:
URL:: https://aclanthology.org/2021.maiworkshop-1.1
DOI:: 10.18653/v1/2021.maiworkshop-1.1
Bibkey:
Cite (ACL):: Isaac Rodríguez Bribiesca, Adrián Pastor López Monroy, and Manuel Montes-y-Gómez. 2021. Multimodal Weighted Fusion of Transformers for Movie Genre Classification. In Proceedings of the Third Workshop on Multimodal Artificial Intelligence, pages 1–5, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: Multimodal Weighted Fusion of Transformers for Movie Genre Classification (Rodríguez Bribiesca et al., maiworkshop 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.maiworkshop-1.1.pdf

PDF Cite Search