SLT at SemEval-2023 Task 1: Enhancing Visual Word Sense Disambiguation through Image Text Retrieval using BLIP

Mohammadreza Molavi, Hossein Zeinali


Abstract
Based on recent progress in image-text retrieval techniques, this paper presents a fine-tuned model for the Visual Word Sense Disambiguation (VWSD) task. The proposed system fine-tunes a pre-trained model using ITC and ITM losses and employs a candidate selection approach for faster inference. The system was trained on the VWSD task dataset and evaluated on a separate test set using Mean Reciprocal Rank (MRR) metric. Additionally, the system was tested on the provided test set which contained Persian and Italian languages, and the results were evaluated on each language separately. Our proposed system demonstrates the potential of fine-tuning pre-trained models for complex language tasks and provides insights for further research in the field of image text retrieval.
Anthology ID:
2023.semeval-1.264
Volume:
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Giovanni Da San Martino, Harish Tayyar Madabushi, Ritesh Kumar, Elisa Sartori
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
1921–1925
Language:
URL:
https://aclanthology.org/2023.semeval-1.264
DOI:
10.18653/v1/2023.semeval-1.264
Bibkey:
Cite (ACL):
Mohammadreza Molavi and Hossein Zeinali. 2023. SLT at SemEval-2023 Task 1: Enhancing Visual Word Sense Disambiguation through Image Text Retrieval using BLIP. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 1921–1925, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
SLT at SemEval-2023 Task 1: Enhancing Visual Word Sense Disambiguation through Image Text Retrieval using BLIP (Molavi & Zeinali, SemEval 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.semeval-1.264.pdf