2018
pdf
bib
Towards Processing of the Oral History Interviews and Related Printed Documents
Zbyněk Zajíc
|
Lucie Skorkovská
|
Petr Neduchal
|
Pavel Ircing
|
Josef V. Psutka
|
Marek Hrúz
|
Aleš Pražák
|
Daniel Soutner
|
Jan Švec
|
Lukáš Bureš
|
Luděk Müller
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2006
pdf
bib
abs
Exploiting Linguistic Knowledge in Language Modeling of Czech Spontaneous Speech
Pavel Ircing
|
Jan Hoidekr
|
Josef Psutka
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
In our paper, we present a method for incorporating available linguistic information into a statistical language model that is used in ASR system for transcribing spontaneous speech. We employ the class-based language model paradigm and use the morphological tags as the basis for world-to-class mapping. Since the number of different tags is at least by one order of magnitude lower than the number of words even in the tasks with moderately-sized vocabularies, the tag-based model can be rather robustly estimated using even the relatively small text corpora. Unfortunately, this robustness goes hand in hand with restricted predictive ability of the class-based model. Hence we apply the two-pass recognition strategy, where the first pass is performed with the standard word-based n-gram and the resulting lattices are rescored in the second pass using the aforementioned class-based model. Using this decoding scenario, we have managed to moderately improve the word error rate in the performed ASR experiments.
pdf
bib
abs
Benefit of a Class-based Language Model for Real-time Closed-captioning of TV Ice-hockey Commentaries
Jan Hoidekr
|
J.V. Psutka
|
Aleš Pražák
|
Josef Psutka
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
This article describes the real-time speech recognition system for closed-captioning of TV ice-hockey commentaries. Automatic transcription of TV commentary accompanying an ice-hockey match is usually a hard task due to the spontaneous speech of a commentator put often into a very loud background noise created by the public, music, siren, drums, whistle, etc. Data for building this system was collected from 41 matches that were played during World Championships in years 2000, 2001, and 2002 and were transmitted by the Czech TV channels. The real-time closed-captioning system is based on the class-based language model designed after careful analysis of training data and OOV words in new (till now unseen) commentaries with the goal to decrease an OOV (Out-Of-Vocabulary) rate and increase recognition accuracy.
pdf
bib
abs
Benefit of a Class-based Language Model for Real-time Closed-captioning of TV Ice-hockey Commentaries
Jan Hoidekr
|
J.V. Psutka
|
Aleš Pražák
|
Josef Psutka
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
This article describes the real-time speech recognition system for closed-captioning of TV ice-hockey commentaries. Automatic transcription of TV commentary accompanying an ice-hockey match is usually a hard task due to the spontaneous speech of a commentator put often into a very loud background noise created by the public, music, siren, drums, whistle, etc. Data for building this system was collected from 41 matches that were played during World Championships in years 2000, 2001, and 2002 and were transmitted by the Czech TV channels. The real-time closed-captioning system is based on the class-based language model designed after careful analysis of training data and OOV words in new (till now unseen) commentaries with the goal to decrease an OOV (Out-Of-Vocabulary) rate and increase recognition accuracy.
2004
pdf
bib
Issues in Annotation of the Czech Spontaneous Speech Corpus in the MALACH project
Josef Psutka
|
Pavel Ircing
|
Jan Hajič
|
Vlasta Radová
|
Josef V. Psutka
|
William J. Byrne
|
Samuel Gustman
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
pdf
bib
Issues in Annotation of the Czech Spontaneous Speech Corpus in the MALACH project
Josef Psutka
|
Pavel Ircing
|
Jan Hajič
|
Vlasta Radová
|
Josef V. Psutka
|
William J. Byrne
|
Samuel Gustman
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
2001
pdf
bib
Robust Knowledge Discovery from Parallel Speech and Text Sources
F. Jelinek
|
W. Byrne
|
S. Khudanpur
|
B. Hladká
|
H. Ney
|
F. J. Och
|
J. Cuřín
|
J. Psutka
Proceedings of the First International Conference on Human Language Technology Research