AIMH at SemEval-2021 Task 6: Multimodal Classification Using an Ensemble of Transformer Models

Nicola Messina, Fabrizio Falchi, Claudio Gennaro, Giuseppe Amato


Abstract
This paper describes the system used by the AIMH Team to approach the SemEval Task 6. We propose an approach that relies on an architecture based on the transformer model to process multimodal content (text and images) in memes. Our architecture, called DVTT (Double Visual Textual Transformer), approaches Subtasks 1 and 3 of Task 6 as multi-label classification problems, where the text and/or images of the meme are processed, and the probabilities of the presence of each possible persuasion technique are returned as a result. DVTT uses two complete networks of transformers that work on text and images that are mutually conditioned. One of the two modalities acts as the main one and the second one intervenes to enrich the first one, thus obtaining two distinct ways of operation. The two transformers outputs are merged by averaging the inferred probabilities for each possible label, and the overall network is trained end-to-end with a binary cross-entropy loss.
Anthology ID:
2021.semeval-1.140
Volume:
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
Month:
August
Year:
2021
Address:
Online
Venue:
SemEval
SIGs:
SIGSEM | SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
1020–1026
Language:
URL:
https://aclanthology.org/2021.semeval-1.140
DOI:
10.18653/v1/2021.semeval-1.140
Bibkey:
Cite (ACL):
Nicola Messina, Fabrizio Falchi, Claudio Gennaro, and Giuseppe Amato. 2021. AIMH at SemEval-2021 Task 6: Multimodal Classification Using an Ensemble of Transformer Models. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pages 1020–1026, Online. Association for Computational Linguistics.
Cite (Informal):
AIMH at SemEval-2021 Task 6: Multimodal Classification Using an Ensemble of Transformer Models (Messina et al., SemEval 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.semeval-1.140.pdf
Code
 mesnico/memepersuasiondetection