2012
pdf
bib
abs
Causal analysis of task completion errors in spoken music retrieval interactions
Sunao Hara
|
Norihide Kitaoka
|
Kazuya Takeda
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
In this paper, we analyze the causes of task completion errors in spoken dialog systems, using a decision tree with N-gram features of the dialog to detect task-incomplete dialogs. The dialog for a music retrieval task is described by a sequence of tags related to user and system utterances and behaviors. The dialogs are manually classified into two classes: completed and uncompleted music retrieval tasks. Differences in tag classification performance between the two classes are discussed. We then construct decision trees which can detect if a dialog finished with the task completed or not, using information gain criterion. Decision trees using N-grams of manual tags and automatic tags achieved 74.2% and 80.4% classification accuracy, respectively, while the tree using interaction parameters achieved an accuracy rate of 65.7%. We also discuss more details of the causality of task incompletion for spoken dialog systems using such trees.
2010
pdf
bib
abs
Estimation Method of User Satisfaction Using N-gram-based Dialog History Model for Spoken Dialog System
Sunao Hara
|
Norihide Kitaoka
|
Kazuya Takeda
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
In this paper, we propose an estimation method of user satisfaction for a spoken dialog system using an N-gram-based dialog history model. We have collected a large amount of spoken dialog data accompanied by usability evaluation scores by users in real environments. The database is made by a field-test in which naive users used a client-server music retrieval system with a spoken dialog interface on their own PCs. An N-gram model is trained from the sequences that consist of users' dialog acts and/or the system's dialog acts for each one of six user satisfaction levels: from 1 to 5 and φ (task not completed). Then, the satisfaction level is estimated based on the N-gram likelihood. Experiments were conducted on the large real data and the results show that our proposed method achieved good classification performance; the classification accuracy was 94.7% in the experiment on a classification into dialogs with task completion and those without task completion. Even if the classifier detected all of the task incomplete dialog correctly, our proposed method achieved the false detection rate of only 6%.
2008
pdf
bib
abs
Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments: newest Part of the CENSREC Series -
Takanobu Nishiura
|
Masato Nakayama
|
Yuki Denda
|
Norihide Kitaoka
|
Kazumasa Yamamoto
|
Takeshi Yamada
|
Satoru Tsuge
|
Chiyomi Miyajima
|
Masakiyo Fujimoto
|
Tetsuya Takiguchi
|
Satoshi Tamura
|
Shingo Kuroiwa
|
Kazuya Takeda
|
Satoshi Nakamura
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Recently, speech recognition performance has been drastically improved by statistical methods and huge speech databases. Now performance improvement under such realistic environments as noisy conditions is being focused on. Since October 2001, we from the working group of the Information Processing Society in Japan have been working on evaluation methodologies and frameworks for Japanese noisy speech recognition. We have released frameworks including databases and evaluation tools called CENSREC-1 (Corpus and Environment for Noisy Speech RECognition 1; formerly AURORA-2J), CENSREC-2 (in-car connected digits recognition), CENSREC-3 (in-car isolated word recognition), and CENSREC-1-C (voice activity detection under noisy conditions). In this paper, we newly introduce a collection of databases and evaluation tools named CENSREC-4, which is an evaluation framework for distant-talking speech under hands-free conditions. Distant-talking speech recognition is crucial for a hands-free speech interface. Therefore, we measured room impulse responses to investigate reverberant speech recognition. The results of evaluation experiments proved that CENSREC-4 is an effective database suitable for evaluating the new dereverberation method because the traditional dereverberation process had difficulty sufficiently improving the recognition performance. The framework was released in March 2008, and many studies are being conducted with it in Japan.
pdf
bib
abs
In-car Speech Data Collection along with Various Multimodal Signals
Akira Ozaki
|
Sunao Hara
|
Takashi Kusakawa
|
Chiyomi Miyajima
|
Takanori Nishino
|
Norihide Kitaoka
|
Katunobu Itou
|
Kazuya Takeda
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
In this paper, a large-scale real-world speech database is introduced along with other multimedia driving data. We designed a data collection vehicle equipped with various sensors to synchronously record twelve-channel speech, three-channel video, driving behavior including gas and brake pedal pressures, steering angles, and vehicle velocities, physiological signals including driver heart rate, skin conductance, and emotion-based sweating on the palms and soles, etc. These multimodal data are collected while driving on city streets and expressways under four different driving task conditions including two kinds of monologues, human-human dialog, and human-machine dialog. We investigated the response timing of drivers against navigator utterances and found that most overlapped with the preceding utterance due to the task characteristics and the features of Japanese. When comparing utterance length, speaking rate, and the filler rate of driver utterances in human-human and human-machine dialogs, we found that drivers tended to use longer and faster utterances with more fillers to talk with humans than machines.
2006
pdf
bib
abs
Statistical Analysis for Thesaurus Construction using an Encyclopedic Corpus
Yasunori Ohishi
|
Katunobu Itou
|
Kazuya Takeda
|
Atsushi Fujii
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
This paper proposes a discrimination method for hierarchical relationsbetween word pairs. The method is a statistical one using an encyclopedic corpus' extracted and organized from Web pages. In the proposed method, we use the statistical naturethat hyponyms' descriptionstend to include hypernyms whereas hypernyms' descriptions do notinclude all of the hyponyms.Experimental results show that the method detected 61.7% of therelations in an actual thesaurus.
2002
pdf
bib
The Present Status of Speech Database in Japan: Development, Management, and Application to Speech Research
Hisao Kuwabara
|
Shuich Itahashi
|
Mikio Yamamoto
|
Toshiyuki Takezawa
|
Satoshi Nakamura
|
Kazuya Takeda
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)
pdf
bib
Multi-Dimensional Data Acquisition for Integrated Acoustic Information Research
Nobuo Kawaguchi
|
Shigeki Matsubara
|
Kazuya Takeda
|
Fumitada Itakura
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)
pdf
bib
Continuous Speech Recognition Consortium an Open Repository for CSR Tools and Models
Akinobu Lee
|
Tatsuya Kawahara
|
Kazuya Takeda
|
Masato Mimura
|
Atsushi Yamada
|
Akinori Ito
|
Katsunobu Itou
|
Kiyohiro Shikano
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)
2000
pdf
bib
IPA Japanese Dictation Free Software Project
Katsunobu Itou
|
Kiyohiro Shikano
|
Tatsuya Kawahara
|
Kasuya Takeda
|
Atsushi Yamada
|
Akinori Itou
|
Takehito Utsuro
|
Tetsunori Kobayashi
|
Nobuaki Minematsu
|
Mikio Yamamoto
|
Shigeki Sagayama
|
Akinobu Lee
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)