Harald Höge

Also published as: Harald Hoege, H. Höge

2020

pdf bib abs
Cortical Speech Databases For Deciphering the Articulatory Code
Harald Höge
Proceedings of the Twelfth Language Resources and Evaluation Conference

The paper relates to following ‘AC-hypotheses’: The articulatory code (AC) is a neural code exchanging multi-item messages between the short-term memory and cortical areas as the vSMC and STG. In these areas already neurons active in the presence of articulatory features have been measured. The AC codes the content of speech segmented in chunks and is the same for both modalities - speech perception and speech production. Each AC-message is related to a syllable. The items of each message relate to coordinated articulatory gestures composing the syllable. The mechanism to transport the AC and to segment the auditory signal is based on Ɵ/γ-oscillations, where a Ɵ-cycle has the duration of a Ɵ-syllable. The paper describes the findings from neuroscience, phonetics and the science of evolution leading to the AC-hypotheses. The paper proposes to verify the AC-hypotheses by measuring the activity of all ensembles of neurons coding and decoding the AC. Due to state of the art, the cortical measurements to be prepared, done and further processed need a high effort from scientists active in different areas. We propose to launch a project to produce cortical speech databases with cortical recordings synchronized with the speech signal allowing to decipher the articulatory code.

2008

pdf bib abs
Evaluation of Modules and Tools for Speech Synthesis: the ECESS Framework
Harald Höge | Zdravko Kacic | Bojan Kotnik | Matej Rojc | Nicolas Moreau | Horst-Udo Hain
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The consortium ECESS (European Center of Excellence for Speech Synthesis) has set up a framework for evaluation of software modules and tools relevant for speech synthesis. Till now two lines of evaluation campaigns have been established: (1) Evaluation of the ECESS TTS modules (text processing, prosody, acoustic synthesis). (2) Evaluation of ECESS tools (pitch extraction, voice activity detection, phonetic segmentation). The functionality and interfaces of the ECESS TTS have been developed by a joint effort between ECESS and the EC-funded project TC-STAR . First evaluation campaigns were conducted within TC-STAR using the ECESS framework. As TC-STAR finished in March 2007, ECESS continued and extended the evaluation of ECESS TTS modules and tools by its own. Within the paper we describe a novel framework which allows performing remote evaluation for modules via the web. First experimental results are reported. Further the result of several evaluation campaigns for tools handling pitch extraction and voice activity detection are presented.

2006

pdf bib abs
Human and machine recognition as a function of SNR
Bernt Andrassy | Harald Hoege
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In-car automatic speech recognition (ASR) is usually evaluated behaviour for different levels of noise. Yet this is interesting for car manufacturers in order to predict system performances for different speeds and different car models and thus allow to design speech based applications in a better way. It therefore makes sense to split the single WER into SNR dependent WERs, where SNR stands for the signal to noise ratio, which is an appropriate measure for the noise level. In this paper a SNR measure based on the concept of the Articulation Index is developed, which allows the direct comparison with human recognition performance.

In the framework of the EU funded project TC-STAR (Technology and Corpora for Speech to Speech Translation),research on TTS aims on providing a synthesized voice sounding like the source speaker speaking the target language. To progress in this direction, research is focused on naturalness, intelligibility, expressivity and voice conversion both, in the TC-STAR framework. For this purpose, specifications on large, high quality TTS databases have been developed and the data have been recorded for UK English, Spanish and Mandarin. The development of speech technology in TC-STAR is evaluation driven. Assessment of speech synthesis is needed to determine how well a system or technique performs in comparison to previous versions as well as other approaches (systems & methods). Apart from testing the whole system, all components of the system will be evaluated separately. This approach grants better assesment of each component as well as identification of the best techniques in the different speech synthesisprocesses.This paper describes the specifications of Language Resources for speech synthesis and the specifications for evaluation of speech synthesis activities.

2004

pdf bib
Evaluation of Microphone Array Front-Ends for ASR - an Extension of the AURORA Framework
Harald Höge | Josef G. Bauer | Christian Geißler | Panji Setiawan | Kai Steinert
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib abs
SALA II Across the Finish Line: A Large Collection of Mobile Telephone Speech Databases from North and Latin America completed
Henk van den Heuvel | Phil Hall | Harald Höge | Asunción Moreno | Antonio Rincon | Francesco Senia
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

The SALA II project comprises mobile telephone recordings according to the SpeechDat (II) paradigm for several languages in North and Latin America. Each database contains the recordings of 1000 speakers, with the exception of US Spanish (2000 speakers) and US English (4000 speakers). A quarter of the recordings of each database are made respectively in a quiet environment (home/office), in the street, in a public place, and in a moving vehicle. This paper presents an evaluation of the project. The paper details on experiences with respect to the implementation of design specifications, speaker recruitment, data recordings (on site), data processing, orthographic transcription and lexicon generation. Furthermore, the validation procedure and its results are documented. Finally, the availability and distribution of the databases are addressed.