Exploring Multimodal Features and Fusion Strategies for Analyzing Disaster Tweets

Raj Pranesh


Abstract
Social media platforms, such as Twitter, often provide firsthand news during the outbreak of a crisis. It is extremely essential to process these facts quickly to plan the response efforts for minimal loss. Therefore, in this paper, we present an analysis of various multimodal feature fusion techniques to analyze and classify disaster tweets into multiple crisis events via transfer learning. In our study, we utilized three image models pre-trained on ImageNet dataset and three fine-tuned language models to learn the visual and textual features of the data and combine them to make predictions. We have presented a systematic analysis of multiple intra-modal and cross-modal fusion strategies and their effect on the performance of the multimodal disaster classification system. In our experiment, we used 8,242 disaster tweets, each comprising image, and text data with five disaster event classes. The results show that the multimodal with transformer-attention mechanism and factorized bilinear pooling (FBP) for intra-modal and cross-modal feature fusion respectively achieved the best performance.
Anthology ID:
2022.wnut-1.6
Volume:
Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022)
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
WNUT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
62–68
Language:
URL:
https://aclanthology.org/2022.wnut-1.6
DOI:
Bibkey:
Cite (ACL):
Raj Pranesh. 2022. Exploring Multimodal Features and Fusion Strategies for Analyzing Disaster Tweets. In Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022), pages 62–68, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):
Exploring Multimodal Features and Fusion Strategies for Analyzing Disaster Tweets (Pranesh, WNUT 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.wnut-1.6.pdf
Data
CrisisMMDImageNet