2023
pdf
bib
abs
Lattice Path Edit Distance: A Romanization-aware Edit Distance for Extracting Misspelling-Correction Pairs from Japanese Search Query Logs
Nobuhiro Kaji
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Edit distance has been successfully used to extract training data, i.e., misspelling-correction pairs, of spelling correction models from search query logs in languages including English. However, the success does not readily apply to Japanese, where misspellings are often dissimilar to correct spellings due to the romanization-based input methods. To address this problem, we introduce lattice path edit distance, which utilizes romanization lattices to efficiently consider all possible romanized forms of input strings. Empirical experiments using Japanese search query logs demonstrated that the lattice path edit distance outperformed baseline methods including the standard edit distance combined with an existing transliterator and morphological analyzer. A training data collection pipeline that uses the lattice path edit distance has been deployed in production at our search engine for over a year.
2019
pdf
bib
abs
Conversation Initiation by Diverse News Contents Introduction
Satoshi Akasaki
|
Nobuhiro Kaji
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
In our everyday chit-chat, there is a conversation initiator, who proactively casts an initial utterance to start chatting. However, most existing conversation systems cannot play this role. Previous studies on conversation systems assume that the user always initiates conversation, and have placed emphasis on how to respond to the given user’s utterance. As a result, existing conversation systems become passive. Namely they continue waiting until being spoken to by the users. In this paper, we consider the system as a conversation initiator and propose a novel task of generating the initial utterance in open-domain non-task-oriented conversation. Here, in order not to make users bored, it is necessary to generate diverse utterances to initiate conversation without relying on boilerplate utterances like greetings. To this end, we propose to generate initial utterance by summarizing and chatting about news articles, which provide fresh and various contents everyday. To address the lack of the training data for this task, we constructed a novel large-scale dataset through crowd-sourcing. We also analyzed the dataset in detail to examine how humans initiate conversations (the dataset will be released to facilitate future research activities). We present several approaches to conversation initiation including information retrieval based and generation based models. Experimental results showed that the proposed models trained on our dataset performed reasonably well and outperformed baselines that utilize automatically collected training data in both automatic and manual evaluation.
2017
pdf
bib
abs
Incremental Skip-gram Model with Negative Sampling
Nobuhiro Kaji
|
Hayato Kobayashi
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
This paper explores an incremental training strategy for the skip-gram model with negative sampling (SGNS) from both empirical and theoretical perspectives. Existing methods of neural word embeddings, including SGNS, are multi-pass algorithms and thus cannot perform incremental model update. To address this problem, we present a simple incremental extension of SGNS and provide a thorough theoretical analysis to demonstrate its validity. Empirical experiments demonstrated the correctness of the theoretical analysis as well as the practical usefulness of the incremental algorithm.
pdf
bib
abs
Predicting Causes of Reformulation in Intelligent Assistants
Shumpei Sano
|
Nobuhiro Kaji
|
Manabu Sassano
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue
Intelligent assistants (IAs) such as Siri and Cortana conversationally interact with users and execute a wide range of actions (e.g., searching the Web, setting alarms, and chatting). IAs can support these actions through the combination of various components such as automatic speech recognition, natural language understanding, and language generation. However, the complexity of these components hinders developers from determining which component causes an error. To remove this hindrance, we focus on reformulation, which is a useful signal of user dissatisfaction, and propose a method to predict the reformulation causes. We evaluate the method using the user logs of a commercial IA. The experimental results have demonstrated that features designed to detect the error of a specific component improve the performance of reformulation cause detection.
pdf
bib
abs
Chat Detection in an Intelligent Assistant: Combining Task-oriented and Non-task-oriented Spoken Dialogue Systems
Satoshi Akasaki
|
Nobuhiro Kaji
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recently emerged intelligent assistants on smartphones and home electronics (e.g., Siri and Alexa) can be seen as novel hybrids of domain-specific task-oriented spoken dialogue systems and open-domain non-task-oriented ones. To realize such hybrid dialogue systems, this paper investigates determining whether or not a user is going to have a chat with the system. To address the lack of benchmark datasets for this task, we construct a new dataset consisting of 15,160 utterances collected from the real log data of a commercial intelligent assistant (and will release the dataset to facilitate future research activity). In addition, we investigate using tweets and Web search queries for handling open-domain user utterances, which characterize the task of chat detection. Experimental experiments demonstrated that, while simple supervised methods are effective, the use of the tweets and search queries further improves the F1-score from 86.21 to 87.53.
2016
pdf
bib
abs
Large-Scale Acquisition of Commonsense Knowledge via a Quiz Game on a Dialogue System
Naoki Otani
|
Daisuke Kawahara
|
Sadao Kurohashi
|
Nobuhiro Kaji
|
Manabu Sassano
Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)
Commonsense knowledge is essential for fully understanding language in many situations. We acquire large-scale commonsense knowledge from humans using a game with a purpose (GWAP) developed on a smartphone spoken dialogue system. We transform the manual knowledge acquisition process into an enjoyable quiz game and have collected over 150,000 unique commonsense facts by gathering the data of more than 70,000 players over eight months. In this paper, we present a simple method for maintaining the quality of acquired knowledge and an empirical analysis of the knowledge acquisition process. To the best of our knowledge, this is the first work to collect large-scale knowledge via a GWAP on a widely-used spoken dialogue system.
pdf
bib
abs
Kotonush: Understanding Concepts Based on Values behind Social Media
Tatsuya Iwanari
|
Kohei Ohara
|
Naoki Yoshinaga
|
Nobuhiro Kaji
|
Masashi Toyoda
|
Masaru Kitsuregawa
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations
Kotonush, a system that clarifies people’s values on various concepts on the basis of what they write about on social media, is presented. The values are represented by ordering sets of concepts (e.g., London, Berlin, and Rome) in accordance with a common attribute intensity expressed by an adjective (e.g., entertaining). We exploit social media text written by different demographics and at different times in order to induce specific orderings for comparison. The system combines a text-to-ordering module with an interactive querying interface enabled by massive hyponymy relations and provides mechanisms to compare the induced orderings from various viewpoints. We empirically evaluate Kotonush and present some case studies, featuring real-world concept orderings with different domains on Twitter, to demonstrate the usefulness of our system.
pdf
bib
Prediction of Prospective User Engagement with Intelligent Assistants
Shumpei Sano
|
Nobuhiro Kaji
|
Manabu Sassano
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
2015
pdf
bib
Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting Translatable Context Pairs
Shonosuke Ishiwatari
|
Nobuhiro Kaji
|
Naoki Yoshinaga
|
Masashi Toyoda
|
Masaru Kitsuregawa
Proceedings of the Nineteenth Conference on Computational Natural Language Learning
2014
pdf
bib
Accurate Word Segmentation and POS Tagging for Japanese Microblogs: Corpus Annotation and Joint Modeling with Lexical Normalization
Nobuhiro Kaji
|
Masaru Kitsuregawa
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
2013
pdf
bib
Predicting and Eliciting Addressee’s Emotion in Online Dialogue
Takayuki Hasegawa
|
Nobuhiro Kaji
|
Naoki Yoshinaga
|
Masashi Toyoda
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
pdf
bib
Collective Sentiment Classification Based on User Leniency and Product Popularity
Wenliang Gao
|
Naoki Yoshinaga
|
Nobuhiro Kaji
|
Masaru Kitsuregawa
Proceedings of the 27th Pacific Asia Conference on Language, Information, and Computation (PACLIC 27)
pdf
bib
Efficient Word Lattice Generation for Joint Word Segmentation and POS Tagging in Japanese
Nobuhiro Kaji
|
Masaru Kitsuregawa
Proceedings of the Sixth International Joint Conference on Natural Language Processing
pdf
bib
Modeling User Leniency and Product Popularity for Sentiment Classification
Wenliang Gao
|
Naoki Yoshinaga
|
Nobuhiro Kaji
|
Masaru Kitsuregawa
Proceedings of the Sixth International Joint Conference on Natural Language Processing
2012
pdf
bib
Identifying Constant and Unique Relations by using Time-Series Text
Yohei Takaku
|
Nobuhiro Kaji
|
Naoki Yoshinaga
|
Masashi Toyoda
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
2011
pdf
bib
Sentiment Classification in Resource-Scarce Languages by using Label Propagation
Yong Ren
|
Nobuhiro Kaji
|
Naoki Yoshinaga
|
Masashi Toyoda
|
Masaru Kitsuregawa
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation
pdf
bib
Splitting Noun Compounds via Monolingual and Bilingual Paraphrasing: A Study on Japanese Katakana Words
Nobuhiro Kaji
|
Masaru Kitsuregawa
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing
2010
pdf
bib
Efficient Staggered Decoding for Sequence Labeling
Nobuhiro Kaji
|
Yasuhiro Fujiwara
|
Naoki Yoshinaga
|
Masaru Kitsuregawa
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
2009
pdf
bib
A Combination of Active Learning and Semi-supervised Learning Starting with Positive and Unlabeled Examples for Word Sense Disambiguation: An Empirical Study on Japanese Web Search Query
Makoto Imamura
|
Yasuhiro Takayama
|
Nobuhiro Kaji
|
Masashi Toyoda
|
Masaru Kitsuregawa
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
2008
pdf
bib
Using Hidden Markov Random Fields to Combine Distributional and Pattern-Based Word Clustering
Nobuhiro Kaji
|
Masaru Kitsuregawa
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)
2007
pdf
bib
Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents
Nobuhiro Kaji
|
Masaru Kitsuregawa
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)
2006
pdf
bib
Automatic Construction of Polarity-Tagged Corpus from HTML Documents
Nobuhiro Kaji
|
Masaru Kitsuregawa
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions
2005
pdf
bib
Lexical Choice via Topic Adaptation for Paraphrasing Written Language to Spoken Language
Nobuhiro Kaji
|
Sadao Kurohashi
Second International Joint Conference on Natural Language Processing: Full Papers
2004
pdf
bib
Paraphrasing Predicates from Written Language to Spoken Language Using the Web
Nobuhiro Kaji
|
Masashi Okamoto
|
Sadao Kurohashi
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004
2002
pdf
bib
Verb Paraphrase based on Case Frame Alignment
Nobuhiro Kaji
|
Daisuke Kawahara
|
Sadao Kurohashi
|
Satoshi Sato
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics
2000
pdf
bib
Japanese Case Structure Analysis
Daisuke Kawahara
|
Nobuhiro Kaji
|
Sadao Kurohashi
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics