Ebhaam at SemEval-2023 Task 1: A CLIP-Based Approach for Comparing Cross-modality and Unimodality in Visual Word Sense Disambiguation

Zeinab Taghavi; Parsa Haghighi Naeini; Mohammad Ali Sadraei Javaheri; Soroush Gooran; Ehsaneddin Asgari; Hamid Reza Rabiee; Hossein Sameti

doi:10.18653/v1/2023.semeval-1.269

Ebhaam at SemEval-2023 Task 1: A CLIP-Based Approach for Comparing Cross-modality and Unimodality in Visual Word Sense Disambiguation

Zeinab Taghavi, Parsa Haghighi Naeini, Mohammad Ali Sadraei Javaheri, Soroush Gooran, Ehsaneddin Asgari, Hamid Reza Rabiee, Hossein Sameti

Abstract

This paper presents an approach to tackle the task of Visual Word Sense Disambiguation (Visual-WSD), which involves determining the most appropriate image to represent a given polysemous word in one of its particular senses. The proposed approach leverages the CLIP model, prompt engineering, and text-to-image models such as GLIDE and DALL-E 2 for both image retrieval and generation. To evaluate our approach, we participated in the SemEval 2023 shared task on “Visual Word Sense Disambiguation (Visual-WSD)” using a zero-shot learning setting, where we compared the accuracy of different combinations of tools, including “Simple prompt-based” methods and “Generated prompt-based” methods for prompt engineering using completion models, and text-to-image models for changing input modality from text to image. Moreover, we explored the benefits of cross-modality evaluation between text and candidate images using CLIP. Our experimental results demonstrate that the proposed approach reaches better results than cross-modality approaches, highlighting the potential of prompt engineering and text-to-image models to improve accuracy in Visual-WSD tasks. We assessed our approach in a zero-shot learning scenario and attained an accuracy of 68.75\% in our best attempt.

Anthology ID:: 2023.semeval-1.269
Volume:: Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Atul Kr. Ojha, A. Seza Doğruöz, Giovanni Da San Martino, Harish Tayyar Madabushi, Ritesh Kumar, Elisa Sartori
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1960–1964
Language:
URL:: https://aclanthology.org/2023.semeval-1.269
DOI:: 10.18653/v1/2023.semeval-1.269
Bibkey:
Cite (ACL):: Zeinab Taghavi, Parsa Haghighi Naeini, Mohammad Ali Sadraei Javaheri, Soroush Gooran, Ehsaneddin Asgari, Hamid Reza Rabiee, and Hossein Sameti. 2023. Ebhaam at SemEval-2023 Task 1: A CLIP-Based Approach for Comparing Cross-modality and Unimodality in Visual Word Sense Disambiguation. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 1960–1964, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Ebhaam at SemEval-2023 Task 1: A CLIP-Based Approach for Comparing Cross-modality and Unimodality in Visual Word Sense Disambiguation (Taghavi et al., SemEval 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.semeval-1.269.pdf
Video:: https://aclanthology.org/2023.semeval-1.269.mp4

PDF Cite Search Video