Adapting Vision-Language Models for E-commerce Understanding at Scale

Matteo Nulli; Orshulevich Vladimir; Tala Bazazo; Christian Herold; Michael Kozielski; Marcin Mazur; Szymon Tuzel; Cees Snoek; Seyyed Hadi Hashemi; Omar Javed; Yannick Versley; Shahram Khadivi

Adapting Vision-Language Models for E-commerce Understanding at Scale

Matteo Nulli, Orshulevich Vladimir, Tala Bazazo, Christian Herold, Michael Kozielski, Marcin Mazur, Szymon Tuzel, Cees G. M. Snoek, Seyyed Hadi Hashemi, Omar Javed, Yannick Versley, Shahram Khadivi

Abstract

E-commerce product understanding demands by nature, strong multimodal comprehension from text, images, and structured attributes. General-purpose Vision–Language Models (VLMs) enable generalizable multimodal latent modelling, yet there is no documented, well-known strategy for adapting them to the attribute-centric, multi-image, and noisy nature of e-commerce data, without sacrificing general performance. In this work, we show through a large-scale experimental study, how targeted adaptation of general VLMs can substantially improve e-commerce performance while preserving broad multimodal capabilities. Furthermore, we propose a novel extensive evaluation suite covering deep product understanding, strict instruction following, and dynamic attribute extraction.

Anthology ID:: 2026.eacl-industry.38
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Yevgen Matusevych, Gülşen Eryiğit, Nikolaos Aletras
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 496–512
Language:
URL:: https://aclanthology.org/2026.eacl-industry.38/
DOI:
Bibkey:
Cite (ACL):: Matteo Nulli, Orshulevich Vladimir, Tala Bazazo, Christian Herold, Michael Kozielski, Marcin Mazur, Szymon Tuzel, Cees G. M. Snoek, Seyyed Hadi Hashemi, Omar Javed, Yannick Versley, and Shahram Khadivi. 2026. Adapting Vision-Language Models for E-commerce Understanding at Scale. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track), pages 496–512, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Adapting Vision-Language Models for E-commerce Understanding at Scale (Nulli et al., EACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.eacl-industry.38.pdf

PDF Cite Search Fix data