Developing Japanese CLIP Models Leveraging an Open-weight LLM for Large-scale Dataset Translation

Issa Sugiura; Shuhei Kurita; Yusuke Oda; Daisuke Kawahara; Naoaki Okazaki

doi:10.18653/v1/2025.naacl-srw.15

Developing Japanese CLIP Models Leveraging an Open-weight LLM for Large-scale Dataset Translation

Issa Sugiura, Shuhei Kurita, Yusuke Oda, Daisuke Kawahara, Naoaki Okazaki

Abstract

CLIP is a foundational model that bridges images and text, widely adopted as a key component in numerous vision-language models.However, the lack of large-scale open Japanese image-text pairs poses a significant barrier to the development of Japanese vision-language models.In this study, we constructed a Japanese image-text pair dataset with 1.5 billion examples using machine translation with open-weight LLMs and pre-trained Japanese CLIP models on the dataset.The performance of the pre-trained models was evaluated across seven benchmark datasets, achieving competitive average scores compared to models of similar size without the need for extensive data curation. However, the results also revealed relatively low performance on tasks specific to Japanese culture, highlighting the limitations of translation-based approaches in capturing cultural nuances. Our dataset, models, and code are publicly available.

Anthology ID:: 2025.naacl-srw.15
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
Month:: April
Year:: 2025
Address:: Albuquerque, USA
Editors:: Abteen Ebrahimi, Samar Haider, Emmy Liu, Sammar Haider, Maria Leonor Pacheco, Shira Wein
Venues:: NAACL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 162–170
Language:
URL:: https://aclanthology.org/2025.naacl-srw.15/
DOI:: 10.18653/v1/2025.naacl-srw.15
Bibkey:
Cite (ACL):: Issa Sugiura, Shuhei Kurita, Yusuke Oda, Daisuke Kawahara, and Naoaki Okazaki. 2025. Developing Japanese CLIP Models Leveraging an Open-weight LLM for Large-scale Dataset Translation. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 162–170, Albuquerque, USA. Association for Computational Linguistics.
Cite (Informal):: Developing Japanese CLIP Models Leveraging an Open-weight LLM for Large-scale Dataset Translation (Sugiura et al., NAACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.naacl-srw.15.pdf

PDF Cite Search Fix data