Tarun Agarwal
2025
Learning to Rewrite Negation Queries in Product Search
Mengtian Guo
|
Mutasem Al-Darabsah
|
Choon Hui Teo
|
Jonathan May
|
Tarun Agarwal
|
Rahul Bhagat
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
In product search, negation is frequently used to articulate unwanted product features or components. Modern search engines often struggle to comprehend negations, resulting in suboptimal user experiences. While various methods have been proposed to tackle negations in search, none of them took the vocabulary gap between query keywords and product text into consideration. In this work, we introduced a query rewriting approach to enhance the performance of product search engines when dealing with queries with negations. First, we introduced a data generation workflow that leverages large language models (LLMs) to extract query rewrites from product text. Subsequently, we trained a Seq2Seq model to generate query rewrite for unseen queries. Our experiments demonstrated that query rewriting yields a 3.17% precision@30 improvement for queries with negations. The promising results pave the way for further research on enhancing the search performance of queries with negations.
2017
Bingo at IJCNLP-2017 Task 4: Augmenting Data using Machine Translation for Cross-linguistic Customer Feedback Classification
Heba Elfardy
|
Manisha Srivastava
|
Wei Xiao
|
Jared Kramer
|
Tarun Agarwal
Proceedings of the IJCNLP 2017, Shared Tasks
The ability to automatically and accurately process customer feedback is a necessity in the private sector. Unfortunately, customer feedback can be one of the most difficult types of data to work with due to the sheer volume and variety of services, products, languages, and cultures that comprise the customer experience. In order to address this issue, our team built a suite of classifiers trained on a four-language, multi-label corpus released as part of the shared task on “Customer Feedback Analysis” at IJCNLP 2017. In addition to standard text preprocessing, we translated each dataset into each other language to increase the size of the training datasets. Additionally, we also used word embeddings in our feature engineering step. Ultimately, we trained classifiers using Logistic Regression, Random Forest, and Long Short-Term Memory (LSTM) Recurrent Neural Networks. Overall, we achieved a Macro-Average F-score between 48.7% and 56.0% for the four languages and ranked 3/12 for English, 3/7 for Spanish, 1/8 for French, and 2/7 for Japanese.
Search
Fix data
Co-authors
- Mutasem Al-Darabsah 1
- Rahul Bhagat 1
- Heba Elfardy 1
- Mengtian Guo 1
- Jared Kramer 1
- show all...