OPI at SemEval-2023 Task 1: Image-Text Embeddings and Multimodal Information Retrieval for Visual Word Sense Disambiguation

Slawomir Dadas


Abstract
The goal of visual word sense disambiguation is to find the image that best matches the provided description of the word’s meaning. It is a challenging problem, requiring approaches that combine language and image understanding. In this paper, we present our submission to SemEval 2023 visual word sense disambiguation shared task. The proposed system integrates multimodal embeddings, learning to rank methods, and knowledge-based approaches. We build a classifier based on the CLIP model, whose results are enriched with additional information retrieved from Wikipedia and lexical databases. Our solution was ranked third in the multilingual task and won in the Persian track, one of the three language subtasks.
Anthology ID:
2023.semeval-1.22
Volume:
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Giovanni Da San Martino, Harish Tayyar Madabushi, Ritesh Kumar, Elisa Sartori
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
155–162
Language:
URL:
https://aclanthology.org/2023.semeval-1.22
DOI:
10.18653/v1/2023.semeval-1.22
Bibkey:
Cite (ACL):
Slawomir Dadas. 2023. OPI at SemEval-2023 Task 1: Image-Text Embeddings and Multimodal Information Retrieval for Visual Word Sense Disambiguation. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 155–162, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
OPI at SemEval-2023 Task 1: Image-Text Embeddings and Multimodal Information Retrieval for Visual Word Sense Disambiguation (Dadas, SemEval 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.semeval-1.22.pdf
Video:
 https://aclanthology.org/2023.semeval-1.22.mp4