Andrei Tiberiu Carp


2026

Multiword expressions (MWEs), particularlyidioms, pose persistent challengesfor vision-language systems due to theirnon-compositional semantics and culturallygrounded meanings. This paper presentsGLIMMER, a three-stage hybrid ranking systemthat evaluates how well images expressthe intended meaning of MWEs across 15 languages.Our approach uses LLM-generatedsemantic glosses as multilingual meaning anchors,combined with dual-path embeddingscoring (textual captions and visual features),and LLM-based semantic verification. Evaluatedon the ADMIRE shared task benchmark,GLIMMER achieves competitive performanceacross diverse languages without relying onparallel training data or language-specific resources.The results show that using glossesto anchor meaning helps match idioms withimages across languages and modalities, andthat combining retrieval with reasoning is morerobust than using embeddings alone.
Search
Co-authors
    Venues
    Fix author