Impacts of Misspelled Queries on Translation and Product Search

Greg Hanneman, Natawut Monaikul, Taichi Nakatani


Abstract
Machine translation is used in e-commerce to translate second-language queries into the primary language of the store, to be matched by the search system against the product catalog. However, many queries contain spelling mistakes. We first present an analysis of the spelling-robustness of a population of MT systems, quantifying how spelling variations affect MT output, the list of returned products, and ultimately user behavior. We then present two sets of practical experiments illustrating how spelling-robustness may be specifically improved. For MT, reducing the number of BPE operations significantly improves spelling-robustness in six language pairs. In end-to-end e-commerce, the inclusion of a dedicated spelling correction model, and the augmentation of that model’s training data with language-relevant phenomena, each improve robustness and consistency of search results.
Anthology ID:
2024.luhme-long.750
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13907–13920
Language:
URL:
https://aclanthology.org/2024.luhme-long.750/
DOI:
10.18653/v1/2024.acl-long.750
Bibkey:
Cite (ACL):
Greg Hanneman, Natawut Monaikul, and Taichi Nakatani. 2024. Impacts of Misspelled Queries on Translation and Product Search. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13907–13920, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Impacts of Misspelled Queries on Translation and Product Search (Hanneman et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-long.750.pdf