Multimodal UNcommonsense: From Odd to Ordinary and Ordinary to Odd

Yejin Son, Saejin Kim, Dongjun Min, Youngjae Yu


Abstract
Commonsense reasoning in multimodal contexts remains a foundational challenge in artificial intelligence. We introduce Multimodal UNcommonsense (MUN), a benchmark designed to evaluate models’ ability to handle scenarios that deviate from typical visual or contextual expectations. MUN pairs visual scenes with surprising or unlikely outcomes described in natural language, prompting models to either rationalize seemingly odd images using everyday logic or uncover unexpected interpretations in ordinary scenes. To support this task, we propose a retrieval-based in-context learning (R-ICL) framework that transfers reasoning capabilities from larger models to smaller ones without additional training. Leveraging a novel Multimodal Ensemble Retriever (MER), our method identifies semantically relevant exemplars even when image and text pairs are deliberately discordant. Experiments show an average improvement of 8.3% over baseline ICL methods, highlighting the effectiveness of R-ICL in low-frequency, atypical settings. MUN opens new directions for evaluating and improving visual-language models’ robustness and adaptability in real-world, culturally diverse, and non-prototypical scenarios.
Anthology ID:
2025.findings-emnlp.954
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17586–17609
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.954/
DOI:
Bibkey:
Cite (ACL):
Yejin Son, Saejin Kim, Dongjun Min, and Youngjae Yu. 2025. Multimodal UNcommonsense: From Odd to Ordinary and Ordinary to Odd. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 17586–17609, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Multimodal UNcommonsense: From Odd to Ordinary and Ordinary to Odd (Son et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.954.pdf
Checklist:
 2025.findings-emnlp.954.checklist.pdf