Ch2R: A Chinese Chatter Robot for Online Shopping Guide

In this paper we present a conversational dialogue system, Ch2R ( Ch inese Ch atter R obot) for online shopping guide, which allows users to inquire about information of mobile phone in Chinese. The purpose of this paper is to describe our development effort in terms of the underlying human language technologies (HLTs) as well as other system issues. We focus on a mixed-initiative conversation mechanism for interactive shopping guide combining initiative guiding and question understanding. We also present some evaluation on the system in mobile phone shopping guide domain. Evaluation results demonstrate the efficiency of our approach.


Introduction
Spoken dialogue systems are presently available for many purposes, such as, Airline Travel Information System (ATIS) project in the early 1990s (Price, 1990), customer service (Gorin et al., 1997), weather inquiry system (Zue et al., 2000), campus navigation system (Zhang et al., 2004), bus schedules and route guidance (Raux et al., 2003), stock information inquiry (Huang et al., 2004), restaurant recommendation system (Liu, et al., 2008), drug review system (Liu and Seneff, 2011), and spoken route instruction (Pappu and Rudnicky, 2012). These systems have been well developed for laboratory research, and some have become commercially viable.
The next generation of intelligent dialogue systems is expected to go beyond factoid question answering and straightforward task fulfillment, by providing active assistance and subjec-tive recommendations, thus behaving more like human agents (Liu et al., 2010). For example, in the scenario that we envision, on online ecommerce site, an intelligent dialogue system which roles play a conversational shopping guide may suggest which digital camera is a better choice, considering brand, price, pixel, etc.; or suggest which mobile phone is the most popular among teenagers or highest rated by users.
In this paper, we describe our development effort on a Chinese chatter robot, named Ch2R (Chinese Chatter Robot) for shopping guide with both intelligent ability and professional knowledge. The challenges of developing such a information guiding dialogue system in Chinese includes: 1) how to provide active assistance and subjective recommendations; 2) how to deal with the diversity and flexibility of Chinese language in question understanding; 3) how to ensure the system with great adaptability which can be easily applied to be a shopping guide in a certain new specialized field.
To tackle the first problem, we propose a mixed-initiative framework. The proposed framework is able to take initiative to obtain users' need, perform passive analysis and understanding of users' questions, and switch between the two modes self-adaptively.
Our solution to the second challenge is to analysis Chinese questions by combining grammar and semantic (Huang et al., 2014). First, hand-crafted sentence compression grammar bases including grammar rules and question type patterns are added to the robot. By sentence compression, the diversity and flexibility of Chinese utterances can be recognized and categorized into limited sentence structures. Then, a question understanding method is proposed by combining grammar based question type pattern recognition and semantic based information extraction and organization.
Finally, we address the last problem by using the design concept of building professional knowledge based on intelligent ability, which combines a variety of human language technologies and intelligent technologies. Such design enables Ch2R with great adaptability, making it easily applied to the shopping guide in a new restricted domain, by adding the semantic knowledge and the detailed commodity information of that specialized field.
An example scenario of Ch2R in mobile phone domain is shown in Figure 1. The purpose of this paper is to describe our development effort, and to present some evaluation results on the system. The remainder of this paper is organized as follows: Section 2 presents the detail framework of Ch2R. The implementation of the mixed-initiative conversation mechanism combining intelligent guiding and understanding within the architecture of our shopping guide robot is proposed in Section 3. The professional knowledge kept by the robot is briefly presented in Section 4. Section 5 shows the preliminary evaluation results of Ch2R in mobile phone shopping guide domain. The paper concludes by outlining future developments and possible applications in Section 6. Figure 2 shows the system framework of Ch2R.

System Framework
In the sentence preprocessing component, Chinese word segmentation and part-of-speech tagging are processed by ICTCLAS (Zhang et al., 2003). Wrongly written characters, internet language, nickname of a product, etc. , are also dealt with in this step.
Then, two main parts enclosed by bold dotted lines denote the initiative guiding (branch ①, e.g. line (3), (5) and (7) in Figure 1) and the passive understanding (branch ②, e.g. line (8), (12), and (14) in Figure 1) modes, remaining the situation when no valid information exists after the process of information extraction component (branch ③, e.g. line (2), (10), and (16) in Figure  1). In that case, AIML process, a process based on the Artificial Intelligence Markup Language (AIML) (Wallace, 2003) is used to handle some simple conversations beyond domain knowledge, some cases potentially to switch to initiative guiding and the state of end. The out of domain utterance processing based on AIML will be discussed in detail later in a separate paper.
In the information extraction component, semantic information is extracted from the source utterance by using with the semantic base. The extracted semantic information is converted into well organized semantic knowledge in knowledge organization component.
For interrogative sentence, i.e. a user question, we use the hand-crafted sentence compression grammar rules to perform sentence compression and employed question type patterns in question structure recognition component, matching the only one question type pattern taking along with information for semantic organization and answering for any input question (Huang et al., 2014).
The Ch2R architecture embodies the combination of intelligent ability and professional knowledge. From the intelligent perspective, sentence compression and question structure recognition components show the ability to understand and analyze questions, live-table stands for the ability to memorize, Reinforcement Learning (RL) to update the sequence of the attributes in live-table embodies the ability of self-learning, and Case Based Reasoning (CBR) of AIML process provides the capacity for logical reasoning. From the professional perspective, info-table provides detailed commodity information of a certain specialized field. The semantic knowledge of that field is stored in semantic base.

Initiative Guiding
One of the major benefits of Ch2R is that it can provide initiative guiding. We first introduce the live-table, and then briefly propose the guiding and recommendation mechanism based on livetable.

Live-table: the Ability to Memorize
Live-table is the message storage that acts as the memory of Ch2R. The information of live-table is live in the sense that it is active during the whole process of shopping guide. There are three kinds of active information in live-table, including the attribute values, the context of the dialogue, and the recommendation list. The meaning representation of Ch2R is similar to other framebased dialogue system, in which frame had predefined slots that were appropriate for task. Understanding in these systems amounted to extracting specific fillers for each slot (e.g. Brand). Figure 3 shows the update process of live-table according to the example dialogue in mobile phone shopping guide from Figure 3 (a).
In our current design, we only keep the last sentences of both chatbot and user to support

Guiding and Recommendation
Effective guiding is achieved by looking up the unconfirmed attributes in the live-table to present a question. It is worthwhile to note that the sequence of the attributes in live-table is variable, which means the priority of the attributes used in initiative guiding can be changed for users' preference in shopping. A process of Reinforcement Learning (RL) (Kaebling et al., 1996) is used to achieve such flexibility, according to the analysis of the users' questions collected in passive understanding mode.
Ch2R would confirm every attribute, however, this would lead to too many times of interactions with user, and it looks a little mechanical. We address this problem by offering a recommendation in the initiative guiding process once there are only limited numbers of candidates, e.g. 1, 2 or 3 candidates. Such recommendation limits the interactions and can embody the profession of Ch2R in shopping guide.

Passive Understanding
We now turn to the passive understanding mode. The word passive means that when face to a user's question, the chatbot has to analyze and answer. Actually, we can simply observed that human being, even a three-year-old child can understand many sentences with different structures and make different responses according to different structure of the questions. It reveals that learning and recognition of sentence type structure is the foundation of the ability to understand and analyze questions. After mastering the grammar sentence type structure, once learning semantic knowledge of a certain domain, one can then dialogue in that specialized field. Therefore, first, hand-crafted grammar bases in Chinese including grammar rules for sentence compression and question type patterns for question structure recognition are added to the shopping guide robot, which act as the "language acquisition device" suggested by Chomsky (2005). And then we simplify a complex sentence leaving only the structure by sentence compression based on grammar rules. Finally, we employed question type patterns for question structure recognition, matching the only one question type pattern for any input question (Huang et al., 2014).

Sentence Compression
For all kinds of languages, sentences are diversified and innumerable, but the sentence structures are limited. By sentence compression, the diversity of user inputs can be recognized and categorized into limited sentence structures, i.e. question type pattern. Given an input source sentence of words ,..., , 2 1  , a target compression y is formed by removing any subset of these words (Knight and Marcu, 2002). The aim of the sentence compression in our system is to produce a summary of a single sentence that retains the most important structure information while remaining grammatical.
Tree based representation is used in sentence compression. The Stanford Chinese Parser 1 (Levy and Manning, 2003) is employed for the tree-based parsing. In order to get a correct syn-1 http://nlp.stanford.edu/software/lex-parser.shtml. tax tree from Stanford Parser, we have to formalize the sentence because Stanford Parse can't understand some sentence structures. Then, we use the hand-crafted sentence compression rules, and rely on recent work in text-to-text generation methods (Cohn and Lapata, 2009;Cohn and Lapata, 2013) to perform sentence compression.
Hand-crafted grammar rules for sentence compression are obtained by analyzing hundreds of question examples with different sentence structures. Because our system use tree based representation in sentence compression, the grammar rules take the forms like (NP (DNP, NP1))->NP(NP1), which states that a NP consisting of a DNP and another NP, denoted as NP1, can be rewritten as NP just consisting of NP1 (without the DNP). Taking a wh-question question, "在你们店有什么 2000 块以下的手机? (Which mobile phones are less than 2000 RMB in your store?)", as example, the sentence compression result is "有什么手机？(Which mobile phones?)". Figure 4 illustrates such example based on the hand-crafted grammar rules base used in our system.

Question Type Pattern Recognition
Question type pattern recognition is important for the later steps in information organization and answering. However, building an effective knowledge base of question type patterns is a challenge, especially in Chinese language, which is unlike English, in which question word can basically represents the classification of the interrogative. We design a 4-set question type pattern as {interrogative sentence type, interrogative word type, interrogative phrases, sentence struc-ture}. Taking the compressed sentence, "有什么 手机？(Which mobile phones?)", as example, its question type pattern is "(特指问，什么，什么/ 哪*，VP+~+NP) (wh-questions, which, which, VP +~+NP )", where ~ stand for the interrogative phrase. Other questions, such as "有哪些手 机？(Which mobile phones?)" and "有哪款手 机？(Which mobile phones?)" will be recognized as this question type pattern. Notice that a more complex question, such as "有什么 2000 块以下的大屏幕的手机？(Which mobile phones are less than 2000 RMB and with big screen?)" and an informal user input, such as "有 什么 2000 块左右的？(Which are about 2000 RMB?)", will be also recognized as the same question type pattern after sentence compression and question type pattern recognition, which shows good robustness of our design. It also leads to good performance with limited question type patterns (30 question type patterns in our current dialogue system) (Huang et al., 2014). The procedure of question type pattern recognition is shown in Figure 5.
Due to the inaccuracy of Chinese word segmentation in Stanford Parser, to raise the matching rate, we remark the word tags by employing a more satisfied word segmentation interface in the first step of question type pattern recognition. Given a compressed interrogative sentence, IC-TCLAS (Zhang et al., 2003) is introduced to remark the word tags and get the syntax sequence.
In the 4-set question type pattern, the interrogative word type is not used as recognition factor. The similarities of interrogative sentence type, the interrogative phrases and the sentence structure are taken as the three factors for computing the similarity between the compressed interrogative sentence and any question type pattern in question type patterns base. Figure 6 shows how to calculate the similarity between the source syntax sequence and the target syntax sequence, i.e. the sentence structure in a certain question type pattern.  In step 1 of Figure 6, different syntax has different weight of score. For instance, modal particles, adverbs, punctuation will have a lower weight in score calculation, but nouns, verbs and interrogative words will have a higher weight. In step 2, the highest score is calculated by Edit Distance algorithm (Ristad and Yianilos, 1998).

Professional Knowledge
The detailed commodity information and semantic knowledge of a certain restricted domain is the professional knowledge that should be added to Ch2R when applied it to the shopping guide in that specialized restricted domain.

Info-table: Detailed Commodity Information
Info-table, which is the basic professional knowledge base of Ch2R, provides detailed commodity information of a certain specialized field. There are totally 89 attributes in the Infotable of mobile phone domain, including one as the primary key, and other 88 attributes provide the commodity information in detail. These attributes are selected from major mobile phone ecommerce sites. Info-table acts as the complete product information of robot. In other words, it gives the robot more sufficient information than real human in online shopping guide.
Algorithm 2: Sentence structure similarity calculation Input: Source syntax sequence x, target syntax sequence y Output: Similarity score, 100 for maximum Reference algorithm: Classical Edit Distance Procedure： 1. for syntax in x, do one of the four operation below: e. replace a syntax α in x with β by score SR(α) + SR(β), which is always negative. f. add a syntax α to x with score SA(α), which is always negative. g. delete a syntax α in x with score SD(α), which is always negative. h. do nothing with a syntax α and get SN(α), which is always positive. 2. transform x into y with the operations above, figure out the highest score s. 3. transform y into y with the operations above, figure out the highest score s_max. 4. return the final similarity score, 100*s/s_max. a. judge if s has the features of interrogative sentence type in a certain question type pattern. Here is the score c1. b. judge if s has the interrogative phrases in a certain question type pattern. Here is the score c2. c. calculate the similarity between ss and the sentence structures in question type patterns. The similarity is c3. d. calculate the final similarity c=c1*w1+c2*w2 +c3* w3, w are the weight between 0 and 1. 3. find the highest score c, and corresponding pattern is the question type pattern of s. 4. return the question type pattern matched. (0<数字<20000)+块/元, 数字(0<=数 字 <20000)+ 到 + 数字(0<= 数字 <20000). price, how much, cheap [0,1000), moderate price [1000,2000), expensive [2000,), about, approximately, much than, less than, not higher than, less/lower/cheaper, higher/more expensive, (0<=digits<20000)+RMB/ Yuan, (0<=digits<20000)+RMB/ Yuan to (0<=digits<20000)+RMB/ Yuan. Semantic base is used for semantic information extraction. Taking the source sentence of "有什么2000块以下的大屏幕的手机？(Which mobile phones are less than 2000 RMB and with big screen?)" as example, the extracted semantic information is " 价格: 2000 块 , 以下; 主 屏 尺 寸 : 屏幕, 大 (Price: 2000 RMB, less than; Screen_size: screen, big)". Then, the extracted semantic information is converted into well organized semantic knowledge based on the corresponding question type pattern. The extracted semantic information in the above example is organized as " '价格: 2000块,以下'and '主屏尺 寸: 屏幕,大' (' Price: 2000 RMB, less than' and 'Screen_size: screen, big')". Generally, knowledge organization of the question type patterns with the same interrogative sentence type will be the same or at less similar. Figure 7 shows a screenshot of the Ch2R in Web-based application form. Chat log is also shown in the web page which is convenient for customer to look over.

Preliminary Evaluation Resluts
We performed a preliminary system evaluation by logging the interactions of 6 subjects with the system. Each evaluator tests 3 times, i.e. total 18 dialogues. All of the evaluators were familiar with the Ch2R system capabilities, but did not have a detailed knowledge of how to constitute a correct reference answer. The overall statistical results are shown in Table 3. Branch ①, ② and ③ stand for the turns of initiative guiding, passive understanding and out of domain utterance processing, respectively. As we can see from Table 3, large gaps between the maximum and the minimum turns of both branch ① and ② show the diversity of the evaluators. Some of them like to ask questions, while some others enjoy system-initiative.
The results for the initiative guiding mode are given in Table 4. Our system provided successful active guiding for 100 of the 105 turns of initiative guiding, and made only 5 cases of failed guiding. One of the errors was due to the change of user intent, i.e., the intent of user was changed but the system failed to catch such change and update the live-table. The other four errors that the system made were due to the imperfection of the current semantic base, which resulted in incorrectly extracting attribute values in utterances, and thus affected the later guiding.
The results for question understanding are given in Table 5. There are total 96 user questions in the test, 3 of which incorrectly enter the out of domain utterance processing branch due to the imperfection of the semantic base. In 93 user questions that entering branch ②, 90.3% were correctly understudied (both correctly semantic information extraction and question type pattern recognition), including some utterances with typing mistakes or ellipsis. We also found that the 93 user questions are only distributed within 12 different question type patterns. That means of the total 30 question type patterns in current system, 18 did not exist in the test. This is mainly because most of the questions in the test are whquestions and yes-no questions. In 9 incorrectly cases, most of them were due to the inaccuracy brought by part-of-speech tagging and the imperfection of the current semantic base. Only 2 errors were due to factors related to the design of the question type patterns.
The results for out of domain utterance processing are given in Table 6. Of the 5 sentences that provided incorrect answers, 4 were also due to the imperfection of the semantic base, and thus incorrectly leading the dialogue process entering out of domain utterance processing branch.

Conclusions and Future Work
This paper presents the development and preliminary evaluation of a Chinese conversational dialogue system named Ch2R with intelligent ability and professional knowledge for online shopping guide. As we can see from the evaluation results, it can perform well in the mobile phone shopping guide in all kinds of situations including initiative guiding, passive understanding, and out of domain utterance processing. Although still in its primary stage, by combining a variety of human language technologies and intelligent technologies into an integrated framework, it can dialogue like a human being and provide a professional service. Moreover, the design concept of building professional knowledge based on intelligent ability ensures Ch2R with great adaptability. We can easily apply it to the shopping guide of other specialized fields.
There are many possible and promising research directions for the near future. We are implementing new and funny interaction, such as communicate in voice using WeChat. Moreover, a separate component of dialogue management with explicit dialogue model will be added to the system. In addition, we also want to experiment with a larger number and various types of users which will make Ch2R more robust.