2012
pdf
bib
abs
AVATecH — automated annotation through audio and video analysis
Przemyslaw Lenkiewicz
|
Binyam Gebrekidan Gebre
|
Oliver Schreer
|
Stefano Masneri
|
Daniel Schneider
|
Sebastian Tschöpel
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
In different fields of the humanities annotations of multimodal resources are a necessary component of the research workflow. Examples include linguistics, psychology, anthropology, etc. However, creation of those annotations is a very laborious task, which can take 50 to 100 times the length of the annotated media, or more. This can be significantly improved by applying innovative audio and video processing algorithms, which analyze the recordings and provide automated annotations. This is the aim of the AVATecH project, which is a collaboration of the Max Planck Institute for Psycholinguistics (MPI) and the Fraunhofer institutes HHI and IAIS. In this paper we present a set of results of automated annotation together with an evaluation of their quality.
2011
pdf
bib
AVATecH: Audio/Video Technology for Humanities Research
Sebastian Tschöpel
|
Daniel Schneider
|
Rolf Bardeli
|
Oliver Schreer
|
Stefano Masneri
|
Peter Wittenburg
|
Han Sloetjes
|
Przemek Lenkiewicz
|
Eric Auer
Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage
2010
pdf
bib
abs
ELAN as Flexible Annotation Framework for Sound and Image Processing Detectors
Eric Auer
|
Albert Russel
|
Han Sloetjes
|
Peter Wittenburg
|
Oliver Schreer
|
S. Masnieri
|
Daniel Schneider
|
Sebastian Tschöpel
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Annotation of digital recordings in humanities research still is, to a large extend, a process that is performed manually. This paper describes the first pattern recognition based software components developed in the AVATecH project and their integration in the annotation tool ELAN. AVATecH (Advancing Video/Audio Technology in Humanities Research) is a project that involves two Max Planck Institutes (Max Planck Institute for Psycholinguistics, Nijmegen, Max Planck Institute for Social Anthropology, Halle) and two Fraunhofer Institutes (Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS, Sankt Augustin, Fraunhofer Heinrich-Hertz-Institute, Berlin) and that aims to develop and implement audio and video technology for semi-automatic annotation of heterogeneous media collections as they occur in multimedia based research. The highly diverse nature of the digital recordings stored in the archives of both Max Planck Institutes, poses a huge challenge to most of the existing pattern recognition solutions and is a motivation to make such technology available to researchers in the humanities.
pdf
bib
abs
DiSCo - A German Evaluation Corpus for Challenging Problems in the Broadcast Domain
Doris Baum
|
Daniel Schneider
|
Rolf Bardeli
|
Jochen Schwenninger
|
Barbara Samlowski
|
Thomas Winkler
|
Joachim Köhler
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Typical broadcast material contains not only studio-recorded texts read by trained speakers, but also spontaneous and dialect speech, debates with cross-talk, voice-overs, and on-site reports with difficult acoustic environments. Standard approaches to speech and speaker recognition usually deteriorate under such conditions. This paper reports on the design, construction, and experimental analysis of DiSCo, a German corpus for the evaluation of speech and speaker recognition on challenging material from the broadcast domain. One of the key requirements for the design of this corpus was a good coverage of different types of serious programmes beyond clean speech and planned speech broadcast news. Corpus annotation encompasses manual segmentation, an orthographic transcription, and labelling with speech mode, dialect, and noise type. We indicate typical use cases for the corpus by reporting results from ASR, speech search, and speaker recognition on the new corpus, thereby obtaining insights into the difficulty of audio recognition on the various classes.