Taichi Nakatani
2024
Impacts of Misspelled Queries on Translation and Product Search
Greg Hanneman
|
Natawut Monaikul
|
Taichi Nakatani
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Machine translation is used in e-commerce to translate second-language queries into the primary language of the store, to be matched by the search system against the product catalog. However, many queries contain spelling mistakes. We first present an analysis of the spelling-robustness of a population of MT systems, quantifying how spelling variations affect MT output, the list of returned products, and ultimately user behavior. We then present two sets of practical experiments illustrating how spelling-robustness may be specifically improved. For MT, reducing the number of BPE operations significantly improves spelling-robustness in six language pairs. In end-to-end e-commerce, the inclusion of a dedicated spelling correction model, and the augmentation of that model’s training data with language-relevant phenomena, each improve robustness and consistency of search results.
Don’t Just Translate, Summarize Too: Cross-lingual Product Title Generation in E-commerce
Bryan Zhang
|
Taichi Nakatani
|
Daniel Vidal Hussey
|
Stephan Walter
|
Liling Tan
Proceedings of the Seventh Workshop on e-Commerce and NLP @ LREC-COLING 2024
Making product titles informative and concise is vital to delighting e-commerce customers. Recent advances have successfully applied monolingual product title summarization to shorten lengthy product titles. This paper explores the cross-lingual product title generation task that summarizes and translates the source language product title to a shortened product title in the target language. Our main contributions are as follows, (i) we investigate the optimal product title length within the scope of e-commerce localization, (ii) we introduce a simple yet effective data filtering technique to train a length-aware machine translation system and compare it to a publicly available LLM, (iii) we propose an automatic approach to validate experimental results using an open-source LLM without human input and show that these evaluation results are consistent with human preferences.
Search
Fix data
Co-authors
- Greg Hanneman 1
- Daniel Vidal Hussey 1
- Natawut Monaikul 1
- Liling Tan 1
- Stephan Walter 1
- show all...