Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary Tasks

Danae Sanchez Villegas, Daniel Preotiuc-Pietro, Nikolaos Aletras


Abstract
Effectively leveraging multimodal information from social media posts is essential to various downstream tasks such as sentiment analysis, sarcasm detection or hate speech classification. Jointly modeling text and images is challenging because cross-modal semantics might be hidden or the relation between image and text is weak. However, prior work on multimodal classification of social media posts has not yet addressed these challenges. In this work, we present an extensive study on the effectiveness of using two auxiliary losses jointly with the main task during fine-tuning multimodal models. First, Image-Text Contrastive (ITC) is designed to minimize the distance between image-text representations within a post, thereby effectively bridging the gap between posts where the image plays an important role in conveying the post’s meaning. Second, Image-Text Matching (ITM) enhances the model’s ability to understand the semantic relationship between images and text, thus improving its capacity to handle ambiguous or loosely related posts. We combine these objectives with five multimodal models, demonstrating consistent improvements of up to 2.6 F1 score across five diverse social media datasets. Our comprehensive analysis shows the specific scenarios where each auxiliary task is most effective.
Anthology ID:
2024.findings-eacl.76
Volume:
Findings of the Association for Computational Linguistics: EACL 2024
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Yvette Graham, Matthew Purver
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1126–1137
Language:
URL:
https://aclanthology.org/2024.findings-eacl.76
DOI:
Bibkey:
Cite (ACL):
Danae Sanchez Villegas, Daniel Preotiuc-Pietro, and Nikolaos Aletras. 2024. Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary Tasks. In Findings of the Association for Computational Linguistics: EACL 2024, pages 1126–1137, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary Tasks (Sanchez Villegas et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-eacl.76.pdf