%0 Conference Proceedings
%T MM-GATBT: Enriching Multimodal Representation Using Graph Attention Network
%A Seo, Seung Byum
%A Nam, Hyoungwook
%A Delgosha, Payam
%Y Ippolito, Daphne
%Y Li, Liunian Harold
%Y Pacheco, Maria Leonor
%Y Chen, Danqi
%Y Xue, Nianwen
%S Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop
%D 2022
%8 July
%I Association for Computational Linguistics
%C Hybrid: Seattle, Washington + Online
%F seo-etal-2022-mm
%X While there have been advances in Natural Language Processing (NLP), their success is mainly gained by applying a self-attention mechanism into single or multi-modalities. While this approach has brought significant improvements in multiple downstream tasks, it fails to capture the interaction between different entities. Therefore, we propose MM-GATBT, a multimodal graph representation learning model that captures not only the relational semantics within one modality but also the interactions between different modalities. Specifically, the proposed method constructs image-based node embedding which contains relational semantics of entities. Our empirical results show that MM-GATBT achieves state-of-the-art results among all published papers on the MM-IMDb dataset.
%R 10.18653/v1/2022.naacl-srw.14
%U https://aclanthology.org/2022.naacl-srw.14
%U https://doi.org/10.18653/v1/2022.naacl-srw.14
%P 106-112