YNWAAZ at SemEval-2026 Task 1: Bridging the Semantic-Visual Gap: Multimodal Humor Generation

Mohammad Erfan Zare; Tahere Abbasi; Hadi Veisi; Sayin Ala; Hanieh Naderi

YNWAAZ at SemEval-2026 Task 1: Bridging the Semantic-Visual Gap: Multimodal Humor Generation

Mohammad Erfan Zare, Tahere Abbasi, Hadi Veisi, Sayin Ala, Hanieh Naderi

Abstract

Developing Computational Humor systems at a multilingual and multimodal scale requires transcending simple text generation paradigms to focus on intent and context understanding. In this study, we address two key limitations in Foundation Models:Association Failure in textual tasks, which prevents the formation of coherent semantic links between incongruous concepts, and Temporal Blindness in video processing, which disrupts narrative comprehension. To tackle these challenges, we propose a unified architecture comprising an Intent-Aware RAG system for mitigating linguistic gaps across English, Spanish, and Chinese, and a Cascaded Visual Perception pipeline for modeling the narrative structure of video content. A key innovation of this work is the utilization of small language models (TinyLlama) as a SemanticDenoise Filter, converting noisy visual signals into structured, coherent textual representations. Experimental results demonstrate that this modular architecture reduces cultural-semantic gaps in certain languages and produces outputs that generally align better with human humor preferences, though highly nuanced languages still present a challenge.

Anthology ID:: 2026.semeval-1.171
Volume:: Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1308–1318
Language:
URL:: https://aclanthology.org/2026.semeval-1.171/
DOI:
Bibkey:
Cite (ACL):: Mohammad Erfan Zare, Tahere Abbasi, Hadi Veisi, Sayin Ala, and Hanieh Naderi. 2026. YNWAAZ at SemEval-2026 Task 1: Bridging the Semantic-Visual Gap: Multimodal Humor Generation. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 1308–1318, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: YNWAAZ at SemEval-2026 Task 1: Bridging the Semantic-Visual Gap: Multimodal Humor Generation (Zare et al., SemEval 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.semeval-1.171.pdf
Supplementarymaterial:: 2026.semeval-1.171.SupplementaryMaterial.zip

PDF Cite Search Supplementarymaterial Fix data