Processing Unknown Words in Continuous Speech Recognition

Kenji Kita, Terumasa Ehara, Tsuyoshi Morimoto


Abstract
Current continuous speech recognition systems essentially ignore unknown words. Systems are designed to recognize words in the lexicon. However, for using speech recognition systems in real applications of spoken-language processing, it is very important to process unknown words. This paper proposes a continuous speech recognition method which accepts any utterance that might include unknown words. In this method, words not in the lexicon are transcribed as phone sequences, while words in the lexicon are recognized correctly. The HMM-LR speech recognition system, which is an integration of Hidden Markov Models and generalized LR parsing, is used as the baseline system, and enhanced with the trigram model of syllables to take into account the stochastic characteristics of a language. Preliminary results indicate that our approach is very promising.
Anthology ID:
1991.iwpt-1.16
Volume:
Proceedings of the Second International Workshop on Parsing Technologies
Month:
February 13-25
Year:
1991
Address:
Cancun, Mexico
Editors:
Masaru Tomita, Martin Kay, Robert Berwick, Eva Hajicova, Aravind Joshi, Ronald Kaplan, Makoto Nagao, Yorick Wilks
Venue:
IWPT
SIG:
SIGPARSE
Publisher:
Association for Computational Linguistics
Note:
Pages:
136–142
Language:
URL:
https://aclanthology.org/1991.iwpt-1.16
DOI:
Bibkey:
Cite (ACL):
Kenji Kita, Terumasa Ehara, and Tsuyoshi Morimoto. 1991. Processing Unknown Words in Continuous Speech Recognition. In Proceedings of the Second International Workshop on Parsing Technologies, pages 136–142, Cancun, Mexico. Association for Computational Linguistics.
Cite (Informal):
Processing Unknown Words in Continuous Speech Recognition (Kita et al., IWPT 1991)
Copy Citation:
PDF:
https://aclanthology.org/1991.iwpt-1.16.pdf