Keiji Shinzato


2022

pdf bib
Extreme Multi-Label Classification with Label Masking for Product Attribute Value Extraction
Wei-Te Chen | Yandi Xia | Keiji Shinzato
Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5)

Although most studies have treated attribute value extraction (AVE) as named entity recognition, these approaches are not practical in real-world e-commerce platforms because they perform poorly, and require canonicalization of extracted values. Furthermore, since values needed for actual services is static in many attributes, extraction of new values is not always necessary. Given the above, we formalize AVE as extreme multi-label classification (XMC). A major problem in solving AVE as XMC is that the distribution between positive and negative labels for products is heavily imbalanced. To mitigate the negative impact derived from such biased distribution, we propose label masking, a simple and effective method to reduce the number of negative labels in training. We exploit attribute taxonomy designed for e-commerce platforms to determine which labels are negative for products. Experimental results using a dataset collected from a Japanese e-commerce platform demonstrate that the label masking improves micro and macro F1 scores by 3.38 and 23.20 points, respectively.

pdf bib
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction
Keiji Shinzato | Naoki Yoshinaga | Yandi Xia | Wei-Te Chen
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

A key challenge in attribute value extraction (AVE) from e-commerce sites is how to handle a large number of attributes for diverse products. Although this challenge is partially addressed by a question answering (QA) approach which finds a value in product data for a given query (attribute), it does not work effectively for rare and ambiguous queries. We thus propose simple knowledge-driven query expansion based on possible answers (values) of a query (attribute) for QA-based AVE. We retrieve values of a query (attribute) from the training data to expand the query. We train a model with two tricks, knowledge dropout and knowledge token mixing, which mimic the imperfection of the value knowledge in testing. Experimental results on our cleaned version of AliExpress dataset show that our method improves the performance of AVE (+6.08 macro F1), especially for rare and ambiguous attributes (+7.82 and +6.86 macro F1, respectively).

2020

pdf bib
ILP-based Opinion Sentence Extraction from User Reviews for Question DB Construction
Masakatsu Hamashita | Takashi Inui | Koji Murakami | Keiji Shinzato
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

2017

pdf bib
Large-Scale Categorization of Japanese Product Titles Using Neural Attention Models
Yandi Xia | Aaron Levine | Pradipto Das | Giuseppe Di Fabbrizio | Keiji Shinzato | Ankur Datta
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We propose a variant of Convolutional Neural Network (CNN) models, the Attention CNN (ACNN); for large-scale categorization of millions of Japanese items into thirty-five product categories. Compared to a state-of-the-art Gradient Boosted Tree (GBT) classifier, the proposed model reduces training time from three weeks to three days while maintaining more than 96% accuracy. Additionally, our proposed model characterizes products by imputing attentive focus on word tokens in a language agnostic way. The attention words have been observed to be semantically highly correlated with the predicted categories and give us a choice of automatic feature extraction for downstream processing.

2013

pdf bib
Precise Information Retrieval Exploiting Predicate-Argument Structures
Daisuke Kawahara | Keiji Shinzato | Tomohide Shibata | Sadao Kurohashi
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Unsupervised Extraction of Attributes and Their Values from Product Description
Keiji Shinzato | Satoshi Sekine
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2010

pdf bib
Exploiting Term Importance Categories and Dependency Relations for Natural Language Search
Keiji Shinzato | Sadao Kurohashi
Proceedings of the Second Workshop on NLP Challenges in the Information Explosion Era (NLPIX 2010)

2008

pdf bib
A Large-Scale Web Data Collection as a Natural Language Processing Infrastructure
Keiji Shinzato | Daisuke Kawahara | Chikara Hashimoto | Sadao Kurohashi
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In recent years, language resources acquired from theWeb are released, and these data improve the performance of applications in several NLP tasks. Although the language resources based on the web page unit are useful in NLP tasks and applications such as knowledge acquisition, document retrieval and document summarization, such language resources are not released so far. In this paper, we propose a data format for results of web page processing, and a search engine infrastructure which makes it possible to share approximately 100 million Japanese web data. By obtaining the web data, NLP researchers are enabled to begin their own processing immediately without analyzing web pages by themselves.

pdf bib
TSUBAKI: An Open Search Engine Infrastructure for Developing New Information Access Methodology
Keiji Shinzato | Tomohide Shibata | Daisuke Kawahara | Chikara Hashimoto | Sadao Kurohashi
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

2004

pdf bib
Extracting Hyponyms of Prespecified Hypernyms from Itemizations and Headings in Web Documents
Keiji Shinzato | Kentaro Torisawa
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Acquiring Hyponymy Relations from Web Documents
Keiji Shinzato | Kentaro Torisawa
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004