Misspelling Detection from Noisy Product Images

Varun Nagaraj Rao, Mingwei Shen


Abstract
Misspellings are introduced on products either due to negligence or as an attempt to deliberately deceive stakeholders. This leads to a revenue loss for online sellers and fosters customer mistrust. Existing spelling research has primarily focused on advancement in misspelling correction and the approach for misspelling detection has remained the use of a large dictionary. The dictionary lookup results in the incorrect detection of several non-dictionary words as misspellings. In this paper, we propose a method to automatically detect misspellings from product images in an attempt to reduce false positive detections. We curate a large scale corpus, define a rich set of features and propose a novel model that leverages importance weighting to account for within class distributional variance. Finally, we experimentally validate this approach on both the curated corpus and an out-of-domain public dataset and show that it leads to a relative improvement of up to 20% in F1 score. The approach thus creates a more robust, generalized deployable solution and reduces reliance on large scale custom dictionaries used today.
Anthology ID:
2020.coling-industry.12
Volume:
Proceedings of the 28th International Conference on Computational Linguistics: Industry Track
Month:
December
Year:
2020
Address:
Online
Editors:
Ann Clifton, Courtney Napoles
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
124–135
Language:
URL:
https://aclanthology.org/2020.coling-industry.12
DOI:
10.18653/v1/2020.coling-industry.12
Bibkey:
Cite (ACL):
Varun Nagaraj Rao and Mingwei Shen. 2020. Misspelling Detection from Noisy Product Images. In Proceedings of the 28th International Conference on Computational Linguistics: Industry Track, pages 124–135, Online. International Committee on Computational Linguistics.
Cite (Informal):
Misspelling Detection from Noisy Product Images (Nagaraj Rao & Shen, COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-industry.12.pdf
Code
 amzn/image-misspell-coling2020