Retrieval Augmented Spelling Correction for E-Commerce Applications

Xuan Guo, Rohit Patki, Dante Everaert, Christopher Potts


Abstract
The rapid introduction of new brand names into everyday language poses a unique challenge for e-commerce spelling correction services, which must distinguish genuine misspellings from novel brand names that use unconventional spelling. We seek to address this challenge via Retrieval Augmented Generation (RAG). On this approach, product names are retrieved from a catalog and incorporated into the context used by a large language model (LLM) that has been fine-tuned to do contextual spelling correction. Through quantitative evaluation and qualitative error analyses, we find improvements in spelling correction utilizing the RAG framework beyond a stand-alone LLM. We also demonstrate the value of additional finetuning of the LLM to incorporate retrieved context.
Anthology ID:
2024.emnlp-industry.7
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
November
Year:
2024
Address:
Miami, Florida, US
Editors:
Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
73–79
Language:
URL:
https://aclanthology.org/2024.emnlp-industry.7
DOI:
Bibkey:
Cite (ACL):
Xuan Guo, Rohit Patki, Dante Everaert, and Christopher Potts. 2024. Retrieval Augmented Spelling Correction for E-Commerce Applications. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 73–79, Miami, Florida, US. Association for Computational Linguistics.
Cite (Informal):
Retrieval Augmented Spelling Correction for E-Commerce Applications (Guo et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-industry.7.pdf