Chiori Hori

2014

This paper describes our automatic speech recognition system for IWSLT2014 evaluation campaign. The system is based on weighted finite-state transducers and a combination of multiple subsystems which consists of four types of acoustic feature sets, four types of acoustic models, and N-gram and recurrent neural network language models. Compared with our system used in last year, we added additional subsystems based on deep neural network modeling on filter bank feature and convolutional deep neural network modeling on filter bank feature with tonal features. In addition, modifications and improvements on automatic acoustic segmentation and deep neural network speaker adaptation were applied. Compared with our last year’s system on speech recognition experiments, our new system achieved 21.5% relative improvement on word error rate on the 2013 English test data set.

pdf bib

Recurrent Neural Network-based Tuple Sequence Model for Machine Translation
Youzheng Wu | Taro Watanabe | Chiori Hori
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib abs

This study presents the NICT automatic speech recognition (ASR) system submitted for the IWSLT 2013 ASR evaluation. We apply two types of acoustic features and three types of acoustic models to the NICT ASR system. Our system is comprised of six subsystems with different acoustic features and models. This study reports the individual results and fusion of systems and highlights the improvements made by our proposed methods that include the automatic segmentation of audio data, language model adaptation, speaker adaptive training of deep neural network models, and the NICT SprinTra decoder. Our experimental results indicated that our proposed methods offer good performance improvements on lecture speech recognition tasks. Our results denoted a 13.5% word error rate on the IWSLT 2013 ASR English test data set.

2012

pdf bib

pdf bib abs

This paper describes our automatic speech recognition (ASR) system for the IWSLT 2012 evaluation campaign. The target data of the campaign is selected from the TED talks, a collection of public speeches on a variety of topics spoken in English. Our ASR system is based on weighted finite-state transducers and exploits an combination of acoustic models for spontaneous speech, language models based on n-gram and factored recurrent neural network trained with effectively selected corpora, and unsupervised topic adaptation framework utilizing ASR results. Accordingly, the system achieved 10.6% and 12.0% word error rate for the tst2011 and tst2012 evaluation set, respectively.

pdf bib abs

In this study, we extend recurrent neural network-based language models (RNNLMs) by explicitly integrating morphological and syntactic factors (or features). Our proposed RNNLM is called a factored RNNLM that is expected to enhance RNNLMs. A number of experiments are carried out on top of state-of-the-art LVCSR system that show the factored RNNLM improves the performance measured by perplexity and word error rate. In the IWSLT TED test data sets, absolute word error rate reductions over RNNLM and n-gram LM are 0.4∼0.8 points.

2011

pdf bib abs

Investigation of the effects of ASR tuning on speech translation performance
Paul R. Dixon | Andrew Finch | Chiori Hori | Hideki Kashioka
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper we describe some of our recent investigations into ASR and SMT coupling issues from an ASR perspective. Our study was motivated by several areas: Firstly, to understand how standard ASR tuning procedures effect the SMT performance and whether it is safe to perform this tuning in isolation. Secondly, to investigate how vocabulary and segmentation mismatches between the ASR and SMT system effect the performance. Thirdly, to uncover any practical issues that arise when using a WFST based speech decoder for tight coupling as opposed to a more traditional tree-search decoding architecture. On the IWSLT07 Japanese-English task we found that larger language model weights only helped the SMT performance when the ASR decoder was tuned in a sub-optimal manner. When we considered the performance with suitable wide beams that ensured the ASR accuracy had converged we observed the language model weight had little influence on the SMT BLEU scores. After the construction of the phrase table the actual SMT vocabulary can be less than the training data vocabulary. By reducing the ASR lexicon to only cover the words the SMT system could accept, we found this lead to an increase in the ASR error rates, however the SMT BLEU scores were nearly unchanged. From a practical point of view this is a useful result as it means we can significantly reduce the memory footprint of the ASR system. We also investigated coupling WFST based ASR to a simple WFST based translation decoder and found it was crucial to perform phrase table expansion to avoid OOV problems. For the WFST translation decoder we describe a semiring based approach for optimizing the log-linear weights.

pdf bib abs

In this paper, we describe NICT’s participation in the IWSLT 2011 evaluation campaign for the ASR Track. To recognize spontaneous speech, we prepared an acoustic model trained by more spontaneous speech corpora and a language model constructed with text corpora distributed by the organizer. We built the multi-pass ASR system by adapting the acoustic and language models with previous ASR results. The target speech was selected from talks on the TED (Technology, Entertainment, Design) program. Here, a large reduction in word error rate was obtained by the speaker adaptation of the acoustic model with MLLR. Additional improvement was achieved not only by adaptation of the language model but also by parallel usage of the baseline and speaker-dependent acoustic models. Accordingly, the final WER was reduced by 30% from the baseline ASR for the distributed test set.

pdf bib

Answering Complex Questions via Exploiting Social Q&A Collection
Youzheng Wu | Chiori Hori | Hisashi Kawai | Hideki Kashioka
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib

Improving Related Entity Finding via Incorporating Homepages and Recognizing Fine-grained Entities
Youzheng Wu | Chiori Hori | Hisashi Kawai | Hideki Kashioka
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib

pdf bib abs

Dialogue Acts Annotation for NICT Kyoto Tour Dialogue Corpus to Construct Statistical Dialogue Systems
Kiyonori Ohtake | Teruhisa Misu | Chiori Hori | Hideki Kashioka | Satoshi Nakamura
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper introduces a new corpus of consulting dialogues designed for training a dialogue manager that can handle consulting dialogues through spontaneous interactions from the tagged dialogue corpus. We have collected more than 150 hours of consulting dialogues in the tourist guidance domain. We are developing the corpus that consists of speech, transcripts, speech act (SA) tags, morphological analysis results, dependency analysis results, and semantic content tags. This paper outlines our taxonomy of dialogue act (DA) annotation that can describe two aspects of an utterance: the communicative function (SA), and the semantic content of the utterance. We provide an overview of the Kyoto tour dialogue corpus and a preliminary analysis using the DA tags. We also show a result of a preliminary experiment for SA tagging via Support Vector Machines (SVMs). We introduce the current states of the corpus development In addition, we mention the usage of our corpus for the spoken dialogue system that is being developed.

2009

pdf bib abs

This demo shows the network-based speech-to-speech translation system. The system was designed to perform realtime, location-free, multi-party translation between speakers of different languages. The spoken language modules: automatic speech recognition (ASR), machine translation (MT), and text-to-speech synthesis (TTS), are connected through Web servers that can be accessed via client applications worldwide. In this demo, we will show the multiparty speech-to-speech translation of Japanese, Chinese, Indonesian, Vietnamese, and English, provided by the NICT server. These speech-to-speech modules have been developed by NICT as a part of A-STAR (Asian Speech Translation Advanced Research) consortium project1.

pdf bib

Annotating Dialogue Acts to Construct Dialogue Systems for Consulting
Kiyonori Ohtake | Teruhisa Misu | Chiori Hori | Hideki Kashioka | Satoshi Nakamura
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)

2005

pdf bib

pdf bib

Overview of the IWSLT 2005 Evaluation Campaign
Matthias Eck | Chiori Hori
Proceedings of the Second International Workshop on Spoken Language Translation

pdf bib

Machine Translation Evaluation Inside QARLA
Enrike Amigo | Jesus Gimenez | Chiori Hori
Proceedings of the Second International Workshop on Spoken Language Translation

2004

pdf bib

Evaluation Measures Considering Sentence Concatenation for Automatic Summarization by Sentence or Word Extraction
Chiori Hori | Tsutomu Hirao | Hideki Isozaki
Text Summarization Branches Out

2003

pdf bib

Co-authors

Venues

LREC1

SIGDIAL1

WS1

Fix author