Retrieval-Enhanced Dual Encoder Training for Product Matching

Justin Chiu


Abstract
Product matching is the task of matching a seller-listed item to an appropriate product. It is a critical task for an e-commerce platform, and the approach needs to be efficient to run in a large-scale setting. A dual encoder approach has been a common practice for product matching recently, due to its high performance and computation efficiency. In this paper, we propose a two-stage training for the dual encoder model. Stage 1 trained a dual encoder to identify the more informative training data. Stage 2 then train on the more informative data to get a better dual encoder model. This technique is a learned approach for building training data. We evaluate the retrieval-enhanced training on two different datasets: a publicly available Large-Scale Product Matching dataset and a real-world e-commerce dataset containing 47 million products. Experiment results show that our approach improved by 2% F1 on the public dataset and 9% F1 on the real-world e-commerce dataset.
Anthology ID:
2023.emnlp-industry.22
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
December
Year:
2023
Address:
Singapore
Editors:
Mingxuan Wang, Imed Zitouni
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
216–222
Language:
URL:
https://aclanthology.org/2023.emnlp-industry.22
DOI:
10.18653/v1/2023.emnlp-industry.22
Bibkey:
Cite (ACL):
Justin Chiu. 2023. Retrieval-Enhanced Dual Encoder Training for Product Matching. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 216–222, Singapore. Association for Computational Linguistics.
Cite (Informal):
Retrieval-Enhanced Dual Encoder Training for Product Matching (Chiu, EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-industry.22.pdf
Video:
 https://aclanthology.org/2023.emnlp-industry.22.mp4