Allen Williams


2022

pdf bib
Data Quality Estimation Framework for Faster Tax Code Classification
Ravi Kondadadi | Allen Williams | Nicolas Nicolov
Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)

This paper describes a novel framework to estimate the data quality of a collection of product descriptions to identify required relevant information for accurate product listing classification for tax-code assignment. Our Data Quality Estimation (DQE) framework consists of a Question Answering (QA) based attribute value extraction model to identify missing attributes and a classification model to identify bad quality records. We show that our framework can accurately predict the quality of product descriptions. In addition to identifying low-quality product listings, our framework can also generate a detailed report at a category level showing missing product information resulting in a better customer experience.