2018
pdf
bib
abs
Neural Network based Extreme Classification and Similarity Models for Product Matching
Kashif Shah
|
Selcuk Kopru
|
Jean-David Ruvini
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)
Matching a seller listed item to an appropriate product has become a fundamental and one of the most significant step for e-commerce platforms for product based experience. It has a huge impact on making the search effective, search engine optimization, providing product reviews and product price estimation etc. along with many other advantages for a better user experience. As significant and vital it has become, the challenge to tackle the complexity has become huge with the exponential growth of individual and business sellers trading millions of products everyday. We explored two approaches; classification based on shallow neural network and similarity based on deep siamese network. These models outperform the baseline by more than 5% in term of accuracy and are capable of extremely efficient training and inference.
pdf
bib
abs
Semi-Supervised Learning with Auxiliary Evaluation Component for Large Scale e-Commerce Text Classification
Mingkuan Liu
|
Musen Wen
|
Selcuk Kopru
|
Xianjing Liu
|
Alan Lu
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP
The lack of high-quality labeled training data has been one of the critical challenges facing many industrial machine learning tasks. To tackle this challenge, in this paper, we propose a semi-supervised learning method to utilize unlabeled data and user feedback signals to improve the performance of ML models. The method employs a primary model Main and an auxiliary evaluation model Eval, where Main and Eval models are trained iteratively by automatically generating labeled data from unlabeled data and/or users’ feedback signals. The proposed approach is applied to different text classification tasks. We report results on both the publicly available Yahoo! Answers dataset and our e-commerce product classification dataset. The experimental results show that the proposed method reduces the classification error rate by 4% and up to 15% across various experimental setups and datasets. A detailed comparison with other semi-supervised learning approaches is also presented later in the paper. The results from various text classification tasks demonstrate that our method outperforms those developed in previous related studies.
2015
pdf
bib
Topic adaptation for machine translation of e-commerce content
Prashant Mathur
|
Marcello Federico
|
Selçuk Köprü
|
Sharam Khadivi
|
Hassan Sawaf
Proceedings of Machine Translation Summit XV: Papers
2010
pdf
bib
abs
Improving Reordering in Statistical Machine Translation from Farsi
Evgeny Matusov
|
Selçuk Köprü
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers
In this paper, we propose a novel model for scoring reordering in phrase-based statistical machine translation (SMT) and successfully use it for translation from Farsi into English and Arabic. The model replaces the distance-based distortion model that is widely used in most SMT systems. The main idea of the model is to penalize each new deviation from the monotonic translation path. We also propose a way for combining this model with manually created reordering rules for Farsi which try to alleviate the difference in sentence structure between Farsi and English/Arabic by changing the position of the verb. The rules are used in the SMT search as soft constraints. In the experiments on two general-domain translation tasks, the proposed penalty-based model improves the BLEU score by up to 1.5% absolute as compared to the baseline of monotonic translation, and up to 1.2% as compared to using the distance-based distortion model.
pdf
bib
abs
AppTek’s APT machine translation system for IWSLT 2010
Evgeny Matusov
|
Selçuk Köprü
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign
In this paper, we describe AppTek’s new APT machine translation system that we employed in the IWSLT 2010 evaluation campaign. This year, we participated in the Arabic-to-English and Turkish-to-English BTEC tasks. We discuss the architecture of the system, the preprocessing steps and the experiments carried out during the campaign. We show that competitive translation quality can be obtained with a system that can be turned into a real-life product without much effort.
2009
pdf
bib
abs
AppTek Turkish-English machine translation system description for IWSLT 2009
Selçuk Köprü
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign
In this paper, we describe the techniques that are explored in the AppTek system to enhance the translations in the Turkish to English track of IWSLT09. The submission was generated using a phrase-based statistical machine translation system. We also researched the usage of morpho-syntactic information and the application of word reordering in order to improve the translation results. The results are evaluated based on BLEU and METEOR scores. We show that the usage of morpho-syntactic information yields 3 BLEU points gain in the overall system.
pdf
bib
A Unification based Approach to the Morphological Analysis and Generation of Arabic
Selçuk Köprü
|
Jude Miller
Proceedings of the Third Workshop on Computational Approaches to Arabic-Script-based Languages (CAASL3)
pdf
bib
Lattice Parsing to Integrate Speech Recognition and Rule-Based Machine Translation
Selçuk Köprü
|
Adnan Yazıcı
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)