Kenji Kita

Error Correction of Speech Recognition Outputs Using Generalized LR Parsing and Confusion Matrix
Tatsuya Iwasa | Kenji Kita
ROCLING 1995 Poster Papers

pdf bib

Probabilistic Language Modeling Based On Mixture Probabilistic Context-Free Grammar
Kenji Kita | Tatsuya Iwasa
Proceedings of Rocling VIII Computational Linguistics Conference VIII

1994

pdf bib abs

Application of Corpora in Second Language Learning: The Problem of Collocational Knowledge Acquisition
Kenji Kita | Takashi Omoto | Yoneo Yano | Yasuhiko Kato
Second Workshop on Very Large Corpora

While corpus-based studies are now becoming a new methodology in natural language processing, second language learning offers one interesting potential application. In this paper, we are primarily concerned with the acquisition of collocational knowledge from corpora for use in language learning. First we discuss the importance of collocational knowledge in second language learning, and then take up two measures, mutual information and cost criteria, for automatically identifying or extracting collocations from corpora. Comparitive experiments are made between the two measures using both Japanese and English corpora. In our experiments, the cost criteria measure proved more effective in extracting interesting collocations such as fundamental idiomatic expressions and phrases.

1991

pdf bib abs

Processing Unknown Words in Continuous Speech Recognition
Kenji Kita | Terumasa Ehara | Tsuyoshi Morimoto
Proceedings of the Second International Workshop on Parsing Technologies

Current continuous speech recognition systems essentially ignore unknown words. Systems are designed to recognize words in the lexicon. However, for using speech recognition systems in real applications of spoken-language processing, it is very important to process unknown words. This paper proposes a continuous speech recognition method which accepts any utterance that might include unknown words. In this method, words not in the lexicon are transcribed as phone sequences, while words in the lexicon are recognized correctly. The HMM-LR speech recognition system, which is an integration of Hidden Markov Models and generalized LR parsing, is used as the baseline system, and enhanced with the trigram model of syllables to take into account the stochastic characteristics of a language. Preliminary results indicate that our approach is very promising.

1989

pdf bib abs

Parsing Continuous Speech by HMM-LR Method
Kenji Kita | Takeshi Kawabata | Hiroaki Saito
Proceedings of the First International Workshop on Parsing Technologies

This paper describes a speech parsing method called HMM-LR. In HMM-LR, an LR parsing table is used to predict phones in speech input, and the system drives an HMM-based speech recognizer directly without any intervening structures such as a phone lattice. Very accurate, efficient speech parsing is achieved through the integrated processes of speech recognition and language analysis. The HMM-LR m ethod is applied to large-vocabulary speaker-dependent Japanese phrase recognition. The recognition rate is 87.1% for the top candidates and 97.7% for the five best candidates.