PoliTo at SemEval-2023 Task 1: CLIP-based Visual-Word Sense Disambiguation Based on Back-Translation

Lorenzo Vaiani, Luca Cagliero, Paolo Garza


Abstract
Visual-Word Sense Disambiguation (V-WSD) entails resolving the linguistic ambiguity in a text by selecting a clarifying image from a set of (potentially misleading) candidates. In this paper, we address V-WSD using a state-of-the-art Image-Text Retrieval system, namely CLIP. We propose to alleviate the linguistic ambiguity across multiple domains and languages via text and image augmentation. To augment the textual content we rely on back-translation with the aid of a variety of auxiliary languages. The approach based on finetuning CLIP on the full phrases is effective in accurately disambiguating words and incorporating back-translation enhance the system’s robustness and performance on the test samples written in Indo-European languages.
Anthology ID:
2023.semeval-1.199
Volume:
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Giovanni Da San Martino, Harish Tayyar Madabushi, Ritesh Kumar, Elisa Sartori
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
1447–1453
Language:
URL:
https://aclanthology.org/2023.semeval-1.199
DOI:
10.18653/v1/2023.semeval-1.199
Bibkey:
Cite (ACL):
Lorenzo Vaiani, Luca Cagliero, and Paolo Garza. 2023. PoliTo at SemEval-2023 Task 1: CLIP-based Visual-Word Sense Disambiguation Based on Back-Translation. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 1447–1453, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
PoliTo at SemEval-2023 Task 1: CLIP-based Visual-Word Sense Disambiguation Based on Back-Translation (Vaiani et al., SemEval 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.semeval-1.199.pdf