Robust Product Classification with Instance-Dependent Noise

Huy Nguyen, Devashish Khatwani


Abstract
Noisy labels in large E-commerce product data (i.e., product items are placed into incorrect categories) is a critical issue for product categorization task because they are unavoidable, non-trivial to remove and degrade prediction performance significantly. Training a product title classification model which is robust to noisy labels in the data is very important to make product classification applications more practical. In this paper, we study the impact of instance-dependent noise to performance of product title classification by comparing our data denoising algorithm and different noise-resistance training algorithms which were designed to prevent a classifier model from over-fitting to noise. We develop a simple yet effective Deep Neural Network for product title classification to use as a base classifier. Along with recent methods of stimulating instance-dependent noise, we propose a novel noise stimulation algorithm based on product title similarity. Our experiments cover multiple datasets, various noise methods and different training solutions. Results uncover the limit of classification task when noise rate is not negligible and data distribution is highly skewed.
Anthology ID:
2022.ecnlp-1.20
Volume:
Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venues:
ACL | ECNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
171–180
Language:
URL:
https://aclanthology.org/2022.ecnlp-1.20
DOI:
10.18653/v1/2022.ecnlp-1.20
Bibkey:
Cite (ACL):
Huy Nguyen and Devashish Khatwani. 2022. Robust Product Classification with Instance-Dependent Noise. In Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5), pages 171–180, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Robust Product Classification with Instance-Dependent Noise (Nguyen & Khatwani, ECNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.ecnlp-1.20.pdf