Wolfgang Bigenzahn
2016
A Database of Laryngeal High-Speed Videos with Simultaneous High-Quality Audio Recordings of Pathological and Non-Pathological Voices
Philipp Aichinger
|
Immer Roesner
|
Matthias Leonhard
|
Doris-Maria Denk-Linnert
|
Wolfgang Bigenzahn
|
Berit Schneider-Stickler
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Auditory voice quality judgements are used intensively for the clinical assessment of pathological voice. Voice quality concepts are fuzzily defined and poorly standardized however, which hinders scientific and clinical communication. The described database documents a wide variety of pathologies and is used to investigate auditory voice quality concepts with regard to phonation mechanisms. The database contains 375 laryngeal high-speed videos and simultaneous high-quality audio recordings of sustained phonations of 80 pathological and 40 non-pathological subjects. Interval wise annotations regarding video and audio quality, as well as voice quality ratings are provided. Video quality is annotated for the visibility of anatomical structures and artefacts such as blurring or reduced contrast. Voice quality annotations include ratings on the presence of dysphonia and diplophonia. The purpose of the database is to aid the formulation of observationally well-founded models of phonation and the development of model-based automatic detectors for distinct types of phonation, especially for clinically relevant nonmodal voice phenomena. Another application is the training of audio-based fundamental frequency extractors on video-based reference fundamental frequencies.