Hideki Kashioka

Also published as: H Kashioka

2012

This paper describes our automatic speech recognition (ASR) system for the IWSLT 2012 evaluation campaign. The target data of the campaign is selected from the TED talks, a collection of public speeches on a variety of topics spoken in English. Our ASR system is based on weighted finite-state transducers and exploits an combination of acoustic models for spontaneous speech, language models based on n-gram and factored recurrent neural network trained with effectively selected corpora, and unsupervised topic adaptation framework utilizing ASR results. Accordingly, the system achieved 10.6% and 12.0% word error rate for the tst2011 and tst2012 evaluation set, respectively.

pdf bib abs
Factored recurrent neural network language model in TED lecture transcription
Youzheng Wu | Hitoshi Yamamoto | Xugang Lu | Shigeki Matsuda | Chiori Hori | Hideki Kashioka
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers

In this study, we extend recurrent neural network-based language models (RNNLMs) by explicitly integrating morphological and syntactic factors (or features). Our proposed RNNLM is called a factored RNNLM that is expected to enhance RNNLMs. A number of experiments are carried out on top of state-of-the-art LVCSR system that show the factored RNNLM improves the performance measured by perplexity and word error rate. In the IWSLT TED test data sets, absolute word error rate reductions over RNNLM and n-gram LM are 0.4∼0.8 points.

2011

In this paper, we describe NICT’s participation in the IWSLT 2011 evaluation campaign for the ASR Track. To recognize spontaneous speech, we prepared an acoustic model trained by more spontaneous speech corpora and a language model constructed with text corpora distributed by the organizer. We built the multi-pass ASR system by adapting the acoustic and language models with previous ASR results. The target speech was selected from talks on the TED (Technology, Entertainment, Design) program. Here, a large reduction in word error rate was obtained by the speaker adaptation of the acoustic model with MLLR. Additional improvement was achieved not only by adaptation of the language model but also by parallel usage of the baseline and speaker-dependent acoustic models. Accordingly, the final WER was reduced by 30% from the baseline ASR for the distributed test set.

pdf bib abs
Investigation of the effects of ASR tuning on speech translation performance
Paul R. Dixon | Andrew Finch | Chiori Hori | Hideki Kashioka
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper we describe some of our recent investigations into ASR and SMT coupling issues from an ASR perspective. Our study was motivated by several areas: Firstly, to understand how standard ASR tuning procedures effect the SMT performance and whether it is safe to perform this tuning in isolation. Secondly, to investigate how vocabulary and segmentation mismatches between the ASR and SMT system effect the performance. Thirdly, to uncover any practical issues that arise when using a WFST based speech decoder for tight coupling as opposed to a more traditional tree-search decoding architecture. On the IWSLT07 Japanese-English task we found that larger language model weights only helped the SMT performance when the ASR decoder was tuned in a sub-optimal manner. When we considered the performance with suitable wide beams that ensured the ASR accuracy had converged we observed the language model weight had little influence on the SMT BLEU scores. After the construction of the phrase table the actual SMT vocabulary can be less than the training data vocabulary. By reducing the ASR lexicon to only cover the words the SMT system could accept, we found this lead to an increase in the ASR error rates, however the SMT BLEU scores were nearly unchanged. From a practical point of view this is a useful result as it means we can significantly reduce the memory footprint of the ASR system. We also investigated coupling WFST based ASR to a simple WFST based translation decoder and found it was crucial to perform phrase table expansion to avoid OOV problems. For the WFST translation decoder we describe a semiring based approach for optimizing the log-linear weights.

pdf bib
Improving Related Entity Finding via Incorporating Homepages and Recognizing Fine-grained Entities
Youzheng Wu | Chiori Hori | Hisashi Kawai | Hideki Kashioka
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Answering Complex Questions via Exploiting Social Q&A Collection
Youzheng Wu | Chiori Hori | Hisashi Kawai | Hideki Kashioka
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib abs
Construction of Back-Channel Utterance Corpus for Responsive Spoken Dialogue System Development
Yuki Kamiya | Tomohiro Ohno | Shigeki Matsubara | Hideki Kashioka
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In spoken dialogues, if a spoken dialogue system does not respond at all during users utterances, the user might feel uneasy because the user does not know whether or not the system has recognized the utterances. In particular, back-channel utterances, which the system outputs as voices such as yeah and uh huh in English have important roles for a driver in in-car speech dialogues because the driver does not look owards a listener while driving. This paper describes construction of a back-channel utterance corpus and its analysis to develop the system which can output back-channel utterances at the proper timing in the responsive in-car speech dialogue. First, we constructed the back-channel utterance corpus by integrating the back-channel utterances that four subjects provided for the drivers utterances in 60 dialogues in the CIAIR in-car speech dialogue corpus. Next, we analyzed the corpus and revealed the relation between back-channel utterance timings and information on bunsetsu, clause, pause and rate of speech. Based on the analysis, we examined the possibility of detecting back-channel utterance timings by machine learning technique. As the result of the experiment, we confirmed that our technique achieved as same detection capability as a human.

pdf bib abs
Dialogue Acts Annotation for NICT Kyoto Tour Dialogue Corpus to Construct Statistical Dialogue Systems
Kiyonori Ohtake | Teruhisa Misu | Chiori Hori | Hideki Kashioka | Satoshi Nakamura
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper introduces a new corpus of consulting dialogues designed for training a dialogue manager that can handle consulting dialogues through spontaneous interactions from the tagged dialogue corpus. We have collected more than 150 hours of consulting dialogues in the tourist guidance domain. We are developing the corpus that consists of speech, transcripts, speech act (SA) tags, morphological analysis results, dependency analysis results, and semantic content tags. This paper outlines our taxonomy of dialogue act (DA) annotation that can describe two aspects of an utterance: the communicative function (SA), and the semantic content of the utterance. We provide an overview of the Kyoto tour dialogue corpus and a preliminary analysis using the DA tags. We also show a result of a preliminary experiment for SA tagging via Support Vector Machines (SVMs). We introduce the current states of the corpus development In addition, we mention the usage of our corpus for the spoken dialogue system that is being developed.

Recently, monologue data such as lecture and commentary by professionals have been considered as valuable intellectual resources, and have been gathering attention. On the other hand, in order to use these monologue data effectively and efficiently, it is necessary for the monologue data not only just to be accumulated but also to be structured. This paper describes the construction of a Japanese spoken monologue corpus in which dependency structure is given to each utterance. Spontaneous monologue includes a lot of very long sentences composed of two or more clauses. In these sentences, there may exist the subject or the adverb common to multi-clauses, and it may be considered that the subject or adverb depend on multi-predicates. In order to give the dependency information in a real fashion, our research allows that a bunsetsu depends on multiple bunsetsus.

pdf bib
Dependency Parsing of Japanese Spoken Monologue Based on Clause Boundaries
Tomohiro Ohno | Shigeki Matsubara | Hideki Kashioka | Takehiko Maruyama | Yasuyoshi Inagaki
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

2005

pdf bib abs
Probabilistic Model for Example-based Machine Translation
Eiji Aramaki | Sadao Kurohashi | Hideki Kashioka | Naoto Kato
Proceedings of Machine Translation Summit X: Papers

Example-based machine translation (EBMT) systems, so far, rely on heuristic measures in retrieving translation examples. Such a heuristic measure costs time to adjust, and might make its algorithm unclear. This paper presents a probabilistic model for EBMT. Under the proposed model, the system searches the translation example combination which has the highest probability. The proposed model clearly formalizes EBMT process. In addition, the model can naturally incorporate the context similarity of translation examples. The experimental results demonstrate that the proposed model has a slightly better translation quality than state-of-the-art EBMT systems.

pdf bib abs
Word Alignment Viewer for Long Sentences
Hideki Kashioka
Proceedings of Machine Translation Summit X: Posters

An aligned corpus is an important resource for developing machine translation systems. We consider suitable units for constructing the translation model through observing an aligned parallel corpus. We examine the characteristics of the aligned corpus. Long sentences are especially difficult for word alignment because the sentences can become very complicated. Also, each (source/target) word has a higher possibility to correspond to the (target/source) word. This paper introduces an alignment viewer a developer can use to correct alignment information. We discuss using the viewer on a patent parallel corpus because sentences in patents are often long and complicated.

pdf bib
Corpus-oriented Acquisition of Chinese Grammar
Yan Zhang | Hideki Kashioka
Proceedings of the Fifth Workshop on Asian Language Resources (ALR-05) and First Symposium on Asian Language Resources Network (ALRN)

pdf bib
Training Data Modification for SMT Considering Groups of Synonymous Sentences
Hideki Kashioka
Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment

2004

pdf bib
Grouping Synonymous Sentences from a Parallel Corpus
Hideki Kashioka
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib abs
Building a parallel corpus for monologues with clause alignment
Hideki Kashioka | Takehiko Maruyama | Hideki Tanaka
Proceedings of Machine Translation Summit IX: Papers

Many studies have been reported in the domain of speech-to-speech machine translation systems for travel conversation use. Therefore, a large number of travel domain corpora have become available in recent years. From a wider viewpoint, speech-to-speech systems are required for many purposes other than travel conversation. One of these is monologues (e.g., TV news, lectures, technical presentations). However, in monologues, sentences tend to be long and complicated, which often causes problems for parsing and translation. Therefore, we need a suitable translation unit, rather than the sentence. We propose the clause as a unit for translation. To develop a speech-to-speech machine translation system for monologues based on the clause as the translation unit, we need a monologue parallel corpus with clause alignment. In this paper, we describe how to build a Japanese-English monologue parallel corpus with clauses aligned, and discuss the features of this corpus.

pdf bib
Word Selection for EBMT based on Monolingual Similarity and Translation Confidence
Eiji Aramaki | Sadao Kurohashi | Hideki Kashioka | Hideki Tanaka
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond

pdf bib
Construction and Analysis of Japanese-English Broadcast News Corpus with Named Entity Tags
Tadashi Kumano | Hideki Kashioka | Hideki Tanaka | Takahiro Fukusima
Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-language Named Entity Recognition

2002

pdf bib
Comparing and Extracting Paraphrasing Words with 2-Way Bilingual Dictionaries
Kazutaka Takao | Kenji Imamura | Hideki Kashioka
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Translation Unit Concerning Timing of Simultaneous Translation
Hideki Kashioka
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

Evaluation of machine translation is one of the most important issues in this field. We have already proposed a quantitative evaluation of machine translation system. The method was roughly that an example sentence in Japanese is machine translated into English, and then into Japanese using several systems, and that the comparison of output Japanese sentences with the original Japanese sentence is done for the word identification, the correctness of the modification, the syntactic dependency, and the parataxis. By calculating the score, we could quantitatively evaluate the English machine translation. However, the extraction of word identification etc. was done by human, and the fact affects the correctness of evaluation. In order to solve this problem, we developed an automatic evaluation system. We report the detail of the system in this paper..

pdf bib
ATR-SLT System for SENSEVAL-2 Japanese Translation Task
Tadashi Kumano | Hideki Kashioka | Hideki Tanaka
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems

2000

pdf bib
Translation using Information on Dialogue Participants
Setsuo Yamada | Eiichiro Sumita | Hideki Kashioka
Sixth Applied Natural Language Processing Conference

pdf bib
Automatically Expansion of Thesaurus Entries with a Different Thesaurus
Hideki Kashioka | Satosi Shirai
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1999

pdf bib abs
Applying TDMT to abstracts on science and technology
Hideki Kashioka | Hiroko Ohta | Yoshiko Shirokizawa | Kazutaka Takao
Proceedings of Machine Translation Summit VII

In this paper, we discuss applying a translation model, "Transfer Driven Machine Translation" (TDMT), to document abstracts on science and technology. TDMT, a machine translation model, was developed by ATR-ITL to deal with dialogues in the travel domain. ATR-ITL has reported that the TDMT system efficiently translates multi-lingual spoken-dialogs. However, little is known about the ability of TDMT to translate written text translations; therefore, we examined TDMT with written text from English to Japanese, especially abstracts on science and technology produced by the Japan Science and Technology Corporation (JST). The experimental results show that TDMT can derive written text translation.

ATR has built a multi-language speech translation system called ATR-MATRIX. It consists of a spoken-language translation subsystem, which is the focus of this paper, together with a highly accurate speech recognition subsystem and a high-definition speech synthesis subsystem. This paper gives a road map of solutions to the problems inherent in spoken-language translation. Spoken-language translation systems need to tackle difficult problems such as ungrammaticality. contextual phenomena, speech recognition errors, and the high-speeds required for real-time use. We have made great strides towards solving these problems in recent years. Our approach mainly uses an example-based translation model called TDMT. We have added the use of extra-linguistic information, a decision tree learning mechanism, and methods dealing with recognition errors.

Compared with off-line machine translation (MT). MT for the WWW has more evaluation factors such as translation accuracy of text, interpretation of HTML tags, consistency with various protocols and browsers, and translation speed for net surfing. Moreover, the speed of technical innovation and its practical application is fast, including the appearance of new protocols. Improvement of MT software for the WWW will enable the sharing of information from around the world and make a great deal of contribution to mankind. Despite the importance of general evaluation studies on MT software for the WWW. it appears that such studies have not yet been conducted. Since MT for the WWW will be a critical factor for future international communication, its study and evaluation is an important theme. This study aims at standardized evaluation of MT for the WWW. and suggests an evaluation method focusing on unique aspects of the WWW independent of text. This evaluation method has a wide range of aptitude without depending on specific languages. Twenty-four items specific to the WWW were actually evaluated with regard to six MT software for the WWW. This study clarified various issues which should be improved in the future regarding MT software for the WWW and issues on evaluation technology of MT on the Internet.

One of the most important issues in the field of machine translation is evaluation of the translated sentences. This paper proposes a quantitative method of evaluation for machine translation systems. The method is as follows. First, an example sentence in Japanese is machine translated into English using several Japanese-English machine translation systems. Second, the output English sentences are machine translated into Japanese using several English-Japanese machine translation systems (different from the Japanese-English machine translation systems). Then, each output Japanese sentence is compared with the original Japanese sentence in terms of word identification, correctness of the modification, syntactic dependency, and parataxes. An average score is calculated, and this becomes the total evaluation of the machine translation of the sentence. From this two-way machine translation and the calculation of the score, we can quantitatively evaluate the English machine translation. For the present study, we selected 100 Japanese sentences from the abstracts of scientific articles. Each of these sentences has an English translation which was performed by a human. Approximately half of these sentences are evaluated and the results are given. In addition, a comparison of human and machine translations is also performed and the trade-off between the two methods of translation is discussed.