Stefan Petrik
2010
Example-Based Automatic Phonetic Transcription
Christina Leitner
|
Martin Schickbichler
|
Stefan Petrik
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Current state-of-the-art systems for automatic phonetic transcription (APT) are mostly phone recognizers based on Hidden Markov models (HMMs). We present a different approach for APT especially designed for transcription with a large inventory of phonetic symbols. In contrast to most systems which are model-based, our approach is non-parametric using techniques derived from concatenative speech synthesis and template-based speech recognition. This example-based approach not only produces draft transcriptions that just need to be corrected instead of created from scratch but also provides a validation mechanism for ensuring consistency within the corpus. Implementations of this transcription framework are available as standalone Java software and extension to the ELAN linguistic annotation software. The transcription system was tested with audio files and reference transcriptions from the Austrian Pronunciation Database (ADABA) and compared to an HMM-based system trained on the same data set. The example-based and the HMM-based system achieve comparable phone recognition rates. A combination of rule-based and example-based APT in a constrained phone recognition scenario returned the best results.
2008
The ATCOSIM Corpus of Non-Prompted Clean Air Traffic Control Speech
Konrad Hofbauer
|
Stefan Petrik
|
Horst Hering
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Air traffic control (ATC) is based on voice communication between pilots and controllers and uses a highly task and domain specific language. Due to this very reason, spoken language technologies for ATC require domain-specific corpora, of which only few exist to this day. The ATCOSIM Air Traffic Control Simulation Speech corpus is a speech database of non-prompted and clean ATC operator speech. It consists of ten hours of speech data, which were recorded in typical ATC control room conditions during ATC real-time simulations. The database includes orthographic transcriptions and additional information on speakers and recording sessions. The ATCOSIM corpus is publicly available and provided online free of charge. In this paper, we first give an overview of ATC related corpora and their shortcomings. We then show the difficulties in obtaining operational ATC speech recordings and propose the use of existing ATC real-time simulations. We describe the recording, transcription, production and validation process of the ATCOSIM corpus, and outline an application example for automatic speech recognition in the ATC domain.