Multimodal Attention Is All You Need

Marco Saioni, Cristina Giannone


Abstract
In this paper, we present a multimodal model for classifying fake news. The main peculiarity of the proposed model is the cross attention mechanism. Cross-attention is an evolution of the attention mechanism that allows the model to examine intermodal relationships to better understand information from different modalities, enabling it to simultaneously focus on the relevant parts of the data extracted from each. We tested the model using MULTI-Fake-DetectiVE data from Evalita 2023. The presented model is particularly effective in both the tasks of classifying fake news and evaluating the intermodal relationship.
Anthology ID:
2024.clicit-1.94
Volume:
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:
December
Year:
2024
Address:
Pisa, Italy
Editors:
Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:
CLiC-it
SIG:
Publisher:
CEUR Workshop Proceedings
Note:
Pages:
873–879
Language:
URL:
https://aclanthology.org/2024.clicit-1.94/
DOI:
Bibkey:
Cite (ACL):
Marco Saioni and Cristina Giannone. 2024. Multimodal Attention Is All You Need. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 873–879, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):
Multimodal Attention Is All You Need (Saioni & Giannone, CLiC-it 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.clicit-1.94.pdf