CM_CLIP: Unveiling Code-Mixed Multimodal Learning with Cross-Lingual CLIP Adaptations

Gitanjali Kumari, Arindam Chatterjee, Ashutosh Bajpai, Asif Ekbal, Vinutha B. NarayanaMurthy


Abstract
In this paper, we present CMCLIP, a Code-Mixed Contrastive Linked Image Pre-trained model, an innovative extension of the widely recognized CLIP model. Our work adapts the CLIP framework to the code-mixed environment through a novel cross-lingual teacher training methodology. Building on the strengths of CLIP, we introduce the first code-mixed pre-trained text-and-vision model, CMCLIP, specifically designed for Hindi-English code-mixed multimodal language settings. The model is developed in two variants: CMCLIP-RB, based on ResNet, and CMCLIP-VX, based on ViT, both of which adapt the original CLIP model to suit code-mixed data. We also introduce a large, novel dataset called Parallel Hybrid Multimodal Code-mixed Hinglish (PHMCH), which forms the foundation for teacher training. The CMCLIP models are evaluated on various downstream tasks, including code-mixed Image-Text Retrieval (ITR) and classification tasks, such as humor and sarcasm detection, using a code-mixed meme dataset. Our experimental results demonstrate that CMCLIP outperforms existing models, such as M3P and multilingual-CLIP, establishing state-of-the-art performance for code-mixed multimodal tasks. We would also like to assert that although our data and frameworks are on Hindi-English code-mix, they can be extended to any other code-mixed language settings.
Anthology ID:
2024.icon-1.36
Volume:
Proceedings of the 21st International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2024
Address:
AU-KBC Research Centre, Chennai, India
Editors:
Sobha Lalitha Devi, Karunesh Arora
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
311–323
Language:
URL:
https://aclanthology.org/2024.icon-1.36/
DOI:
Bibkey:
Cite (ACL):
Gitanjali Kumari, Arindam Chatterjee, Ashutosh Bajpai, Asif Ekbal, and Vinutha B. NarayanaMurthy. 2024. CM_CLIP: Unveiling Code-Mixed Multimodal Learning with Cross-Lingual CLIP Adaptations. In Proceedings of the 21st International Conference on Natural Language Processing (ICON), pages 311–323, AU-KBC Research Centre, Chennai, India. NLP Association of India (NLPAI).
Cite (Informal):
CM_CLIP: Unveiling Code-Mixed Multimodal Learning with Cross-Lingual CLIP Adaptations (Kumari et al., ICON 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.icon-1.36.pdf