WildFireCan-MMD: A Multimodal Dataset for Classification of User-Generated Content During Wildfires in Canada

Braeden Sherritt; Isar Nejadgholi; Efstratios Aivaliotis; Khaled Mslmani; Marzieh Amini

WildFireCan-MMD: A Multimodal Dataset for Classification of User-Generated Content During Wildfires in Canada

Braeden Sherritt, Isar Nejadgholi, Efstratios Aivaliotis, Khaled Mslmani, Marzieh Amini

Abstract

Rapid information access is vital during wildfires, yet traditional data sources are slow and costly. Social media offers real-time updates, but extracting relevant insights remains a challenge. In this work, we focus on multimodal wildfire social media data, which, although existing in current datasets, is currently underrepresented in Canadian contexts. We present WildFireCan-MMD, a new multimodal dataset of X posts from recent Canadian wildfires, annotated across twelve key themes. We evaluate zero-shot vision-language models on this dataset and compare their results with those of custom-trained and baseline classifiers. We show that while baseline methods and zero-shot prompting offer quick deployment, custom-trained models outperform them when labelled data is available. Our best-performing custom model reaches 84.48±0.69% f-score, outperforming VLMs and baseline classifiers. We also demonstrate how this model can be used to uncover trends during wildfires, through the collection and analysis of a large unlabeled dataset. Our dataset facilitates future research in wildfire response, and our findings highlight the importance of tailored datasets and task-specific training. Importantly, such datasets should be localized, as disaster response requirements vary across regions and contexts.

Anthology ID:: 2025.findings-ijcnlp.53
Volume:: Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Month:: December
Year:: 2025
Address:: Mumbai, India
Editors:: Kentaro Inui, Sakriani Sakti, Haofen Wang, Derek F. Wong, Pushpak Bhattacharyya, Biplab Banerjee, Asif Ekbal, Tanmoy Chakraborty, Dhirendra Pratap Singh
Venue:: Findings
SIG:
Publisher:: The Asian Federation of Natural Language Processing and The Association for Computational Linguistics
Note:
Pages:: 884–902
Language:
URL:: https://aclanthology.org/2025.findings-ijcnlp.53/
DOI:
Bibkey:
Cite (ACL):: Braeden Sherritt, Isar Nejadgholi, Efstratios Aivaliotis, Khaled Mslmani, and Marzieh Amini. 2025. WildFireCan-MMD: A Multimodal Dataset for Classification of User-Generated Content During Wildfires in Canada. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 884–902, Mumbai, India. The Asian Federation of Natural Language Processing and The Association for Computational Linguistics.
Cite (Informal):: WildFireCan-MMD: A Multimodal Dataset for Classification of User-Generated Content During Wildfires in Canada (Sherritt et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-ijcnlp.53.pdf

PDF Cite Search Fix data