AraCLIP: Cross-Lingual Learning for Effective Arabic Image Retrieval

Muhammad Al-Barham; Imad Afyouni; Khalid Almubarak; Ashraf Elnagar; Ayad Turky; Ibrahim Hashem

doi:10.18653/v1/2024.arabicnlp-1.9

AraCLIP: Cross-Lingual Learning for Effective Arabic Image Retrieval

Muhammad Al-Barham, Imad Afyouni, Khalid Almubarak, Ashraf Elnagar, Ayad Turky, Ibrahim Hashem

Abstract

This paper introduces Arabic Contrastive Language-Image Pre-training (AraCLIP), a model designed for Arabic image retrieval tasks, building upon the Contrastive Language-Image Pre-training (CLIP) architecture. AraCLIP leverages Knowledge Distillation to transfer cross-modal knowledge from English to Arabic, enhancing its ability to understand Arabic text and retrieve relevant images. Unlike existing multilingual models, AraCLIP is uniquely positioned to understand the intricacies of the Arabic language, including specific terms, cultural nuances, and contextual constructs. By leveraging the CLIP architecture as our foundation, we introduce a novel approach that seamlessly integrates textual and visual modalities, enabling AraCLIP to effectively retrieve images based on Arabic textual queries. We offer an online demonstration allowing users to input Arabic prompts and compare AraCLIP’s performance with state-of-the-art multilingual models. We conduct comprehensive experiments to evaluate AraCLIP’s performance across diverse datasets, including Arabic XTD-11, and Arabic Flicker 8k. Our results showcase AraCLIP’s superiority in image retrieval accuracy, demonstrating its effectiveness in handling Arabic queries. AraCLIP represents a significant advancement in cross-lingual image retrieval, offering promising applications in Arabic language processing and beyond.

Anthology ID:: 2024.arabicnlp-1.9
Volume:: Proceedings of the Second Arabic Natural Language Processing Conference
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Nizar Habash, Houda Bouamor, Ramy Eskander, Nadi Tomeh, Ibrahim Abu Farha, Ahmed Abdelali, Samia Touileb, Injy Hamed, Yaser Onaizan, Bashar Alhafni, Wissam Antoun, Salam Khalifa, Hatem Haddad, Imed Zitouni, Badr AlKhamissi, Rawan Almatham, Khalil Mrini
Venues:: ArabicNLP | WS
SIG:: SIGARAB
Publisher:: Association for Computational Linguistics
Note:
Pages:: 102–110
Language:
URL:: https://aclanthology.org/2024.arabicnlp-1.9/
DOI:: 10.18653/v1/2024.arabicnlp-1.9
Bibkey:
Cite (ACL):: Muhammad Al-Barham, Imad Afyouni, Khalid Almubarak, Ashraf Elnagar, Ayad Turky, and Ibrahim Hashem. 2024. AraCLIP: Cross-Lingual Learning for Effective Arabic Image Retrieval. In Proceedings of the Second Arabic Natural Language Processing Conference, pages 102–110, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: AraCLIP: Cross-Lingual Learning for Effective Arabic Image Retrieval (Al-Barham et al., ArabicNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.arabicnlp-1.9.pdf

PDF Cite Search Fix data