Zhe Wu


2024

pdf bib
Centrality-aware Product Retrieval and Ranking
Hadeel Saadany | Swapnil Bhosale | Samarth Agrawal | Diptesh Kanojia | Constantin Orasan | Zhe Wu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track

This paper addresses the challenge of improving user experience on e-commerce platforms by enhancing product ranking relevant to user’s search queries. Ambiguity and complexity of user queries often lead to a mismatch between user’s intent and retrieved product titles or documents. Recent approaches have proposed the use of Transformer-based models which need millions of annotated query-title pairs during the pre-training stage, and this data often does not take user intent into account. To tackle this, we curate samples from existing datasets at eBay, manually annotated with buyer-centric relevance scores, and centrality scores which reflect how well the product title matches the user’s intent. We introduce a User-intent Centrality Optimization (UCO) approach for existing models, which optimizes for the user intent in semantic product search. To that end, we propose a dual-loss based optimization to handle hard negatives, i.e., product titles that are semantically relevant but do not reflect the user’s intent. Our contributions include curating challenging evaluation sets and implementing UCO, resulting in significant improvements in product ranking efficiency, observed for different evaluation metrics. Our work aims to ensure that the most buyer-centric titles for a query are ranked higher, thereby, enhancing the user experience on e-commerce platforms.

pdf bib
QueryNER: Segmentation of E-commerce Queries
Chester Palen-Michel | Lizzie Liang | Zhe Wu | Constantine Lignos
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We present QueryNER, a manually-annotated dataset and accompanying model for e-commerce query segmentation. Prior work in sequence labeling for e-commerce has largely addressed aspect-value extraction which focuses on extracting portions of a product title or query for narrowly defined aspects. Our work instead focuses on the goal of dividing a query into meaningful chunks with broadly applicable types. We report baseline tagging results and conduct experiments comparing token and entity dropping for null and low recall query recovery. Challenging test sets are created using automatic transformations and show how simple data augmentation techniques can make the models more robust to noise. We make the QueryNER dataset publicly available.