Data Quality Estimation Framework for Faster Tax Code Classification

Ravi Kondadadi, Allen Williams, Nicolas Nicolov


Abstract
This paper describes a novel framework to estimate the data quality of a collection of product descriptions to identify required relevant information for accurate product listing classification for tax-code assignment. Our Data Quality Estimation (DQE) framework consists of a Question Answering (QA) based attribute value extraction model to identify missing attributes and a classification model to identify bad quality records. We show that our framework can accurately predict the quality of product descriptions. In addition to identifying low-quality product listings, our framework can also generate a detailed report at a category level showing missing product information resulting in a better customer experience.
Anthology ID:
2022.ecnlp-1.4
Volume:
Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Shervin Malmasi, Oleg Rokhlenko, Nicola Ueffing, Ido Guy, Eugene Agichtein, Surya Kallumadi
Venue:
ECNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
29–34
Language:
URL:
https://aclanthology.org/2022.ecnlp-1.4
DOI:
10.18653/v1/2022.ecnlp-1.4
Bibkey:
Cite (ACL):
Ravi Kondadadi, Allen Williams, and Nicolas Nicolov. 2022. Data Quality Estimation Framework for Faster Tax Code Classification. In Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5), pages 29–34, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Data Quality Estimation Framework for Faster Tax Code Classification (Kondadadi et al., ECNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.ecnlp-1.4.pdf
Video:
 https://aclanthology.org/2022.ecnlp-1.4.mp4