Multi-Modal Fashion Product Retrieval

Antonio Rubio Romano, LongLong Yu, Edgar Simo-Serra, Francesc Moreno-Noguer


Abstract
Finding a product in the fashion world can be a daunting task. Everyday, e-commerce sites are updating with thousands of images and their associated metadata (textual information), deepening the problem. In this paper, we leverage both the images and textual metadata and propose a joint multi-modal embedding that maps both the text and images into a common latent space. Distances in the latent space correspond to similarity between products, allowing us to effectively perform retrieval in this latent space. We compare against existing approaches and show significant improvements in retrieval tasks on a large-scale e-commerce dataset.
Anthology ID:
W17-2007
Volume:
Proceedings of the Sixth Workshop on Vision and Language
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Anya Belz, Erkut Erdem, Katerina Pastra, Krystian Mikolajczyk
Venue:
VL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
43–45
Language:
URL:
https://aclanthology.org/W17-2007
DOI:
10.18653/v1/W17-2007
Bibkey:
Cite (ACL):
Antonio Rubio Romano, LongLong Yu, Edgar Simo-Serra, and Francesc Moreno-Noguer. 2017. Multi-Modal Fashion Product Retrieval. In Proceedings of the Sixth Workshop on Vision and Language, pages 43–45, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Multi-Modal Fashion Product Retrieval (Rubio Romano et al., VL 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-2007.pdf