Florian Schiel

2022

This paper presents the results of an ongoing collaboration to develop an Arabic variety-independent romanization system that aims to homogenize and simplify the romanization of the Arabic script, and introduces an Arabic variety-independent WebMAUS service offering a free to use forced-alignment service fully integrated within the WebMAUS services. We present the rationale for developing such a system, highlighting the need for a detailed romanization system with graphemes corresponding to the phonemic short and long vowels/consonants in Arabic varieties. We describe how the acoustic model was created, followed by several hands-on recipes for applying the forced alignment webservice either online or programatically. Finally, we discuss some of the issues we faced during the development of the system.

2018

pdf bib

MOCCA: Measure of Confidence for Corpus Analysis - Automatic Reliability Check of Transcript and Automatic Segmentation
Thomas Kisler | Florian Schiel
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib

Evaluation of Automatic Formant Trackers
Florian Schiel | Thomas Zitzelsberger
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib

A Web Service for Pre-segmenting Very Long Transcribed Speech Recordings
Nina Poerner | Florian Schiel
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib abs

The BAS Speech Data Repository
Uwe Reichel | Florian Schiel | Thomas Kisler | Christoph Draxler | Nina Pörner
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The BAS CLARIN speech data repository is introduced. At the current state it comprises 31 pre-dominantly German corpora of spoken language. It is compliant to the CLARIN-D as well as the OLAC requirements. This enables its embedding into several infrastructures. We give an overview over its structure, its implementation as well as the corpora it contains.

pdf bib abs

BAS Speech Science Web Services - an Update of Current Developments
Thomas Kisler | Uwe Reichel | Florian Schiel | Christoph Draxler | Bernhard Jackl | Nina Pörner
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In 2012 the Bavarian Archive for Speech Signals started providing some of its tools from the field of spoken language in the form of Software as a Service (SaaS). This means users access the processing functionality over a web browser and therefore do not have to install complex software packages on a local computer. Amongst others, these tools include segmentation & labeling, grapheme-to-phoneme conversion, text alignment, syllabification and metadata generation, where all but the last are available for a variety of languages. Since its creation the number of available services and the web interface have changed considerably. We give an overview and a detailed description of the system architecture, the available web services and their functionality. Furthermore, we show how the number of files processed over the system developed in the last four years.

2014

pdf bib abs

Untrained Forced Alignment of Transcriptions and Audio for Language Documentation Corpora using WebMAUS
Jan Strunk | Florian Schiel | Frank Seifart
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Language documentation projects supported by recent funding intiatives have created a large number of multimedia corpora of typologically diverse languages. Most of these corpora provide a manual alignment of transcription and audio data at the level of larger units, such as sentences or intonation units. Their usefulness both for corpus-linguistic and psycholinguistic research and for the development of tools and teaching materials could, however, be increased by achieving a more fine-grained alignment of transcription and audio at the word or even phoneme level. Since most language documentation corpora contain data on small languages, there usually do not exist any speech recognizers or acoustic models specifically trained on these languages. We therefore investigate the feasibility of untrained forced alignment for such corpora. We report on an evaluation of the tool (Web)MAUS (Kisler, 2012) on several language documentation corpora and discuss practical issues in the application of forced alignment. Our evaluation shows that (Web)MAUS with its existing acoustic models combined with simple grapheme-to-phoneme conversion can be successfully used for word-level forced alignment of a diverse set of languages without additional training, especially if a manual prealignment of larger annotation units is already avaible.

pdf bib abs

German Alcohol Language Corpus - the Question of Dialect
Florian Schiel | Thomas Kisler
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Speech uttered under the influence of alcohol is known to deviate from the speech of the same person when sober. This is an important feature in forensic investigations and could also be used to detect intoxication in the automotive environment. Aside from acoustic-phonetic features and speech content which have already been studied by others in this contribution we address the question whether speakers use dialectal variation or dialect words more frequently when intoxicated than when sober. We analyzed 300,000 recorded word tokens in read and spontaneous speech uttered by 162 female and male speakers within the German Alcohol Language Corpus. We found that contrary to our expectations the frequency of dialectal forms decreases significantly when speakers are under the influence. We explain this effect with a compensatory over-shoot mechanism: speakers are aware of their intoxication and that they are being monitored. In forensic analysis of speech this ‘awareness factor’ must be taken into account.

2012

pdf bib abs

Statistical Evaluation of Pronunciation Encoding
Iris Merkus | Florian Schiel
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this study we investigate the idea to automatically evaluate newly created pronunciation encodings for being correct or containing a potential error. Using a cascaded triphone detector and phonotactical n-gram modeling with an optimal Bayesian threshold we classify unknown pronunciation transcripts into the classes 'probably faulty' or 'probably correct'. Transcripts tagged 'probably faulty' are forwarded to a manual inspection performed by an expert, while encodings tagged 'probably correct' are passed without further inspection. An evaluation of the new method on the German PHONOLEX lexical resource shows that with a tolerable error margin of approximately 3% faulty transcriptions a major reduction in work effort during the production of a new lexical resource can be achieved.

2010

pdf bib abs

BAStat : New Statistical Resources at the Bavarian Archive for Speech Signals
Florian Schiel
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

A new type of language resource called 'BAStat' has been released by the Bavarian Archive for Speech Signals at Ludwig Maximilians Universitaet, Munich. In contrast to primary resources like speech and text corpora BAStat comprises statistical estimates based on a number of primary spoken language resources: first and second order occurrence probability of phones, syllables and words, duration statistics, probabilities of pronunciation variants of words and probabilities of context information. Unlike other statistical speech resources BAStat is based solely on recordings of conversational German and therefore models spoken language not text. The resource consists of a bundle of 7-bit ASCII tables and matrices to maximize inter-operability between different operation systems and can be downloaded for free from the BAS web-site. This contribution gives a detailed description about the empirical basis, the contained data types, the format of the resulting statistical data, some interesting interpretations of grand figures and a brief comparison to the text-based statistical resource CELEX.

2008

pdf bib abs

Talking and Looking: the SmartWeb Multimodal Interaction Corpus
Florian Schiel | Hannes Mögele
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Nowadays portable devices such as smart phones can be used to capture the face of a user simultaneously with the voice input. Server based or even embedded dialogue system might utilize this additional information to detect whether the speaking user addresses the system or other parties or whether the listening user is focused on the display or not. Depending on these findings the dialogue system might change its strategy to interact with the user improving the overall communication between human and system. To develop and test methods for On/Off-Focus detection a multimodal corpus of user-machine interactions was recorded within the German SmartWeb project. The corpus comprises 99 recording sessions of a triad communication between the user, the system and a human companion. The user can address/watch/listen to the system but also talk to his companion, read from the display or simply talk to herself. Facial video is captured with a standard built-in video camera of a smart phone while voice input in being recorded by a high quality close microphone as well as over a realistic transmission line via Bluetooth and WCDMA. The resulting SmartWeb Video Corpus (SVC) can be obtained from the Bavarian Archive for Speech Signals.

pdf bib abs

F0 of Adolescent Speakers - First Results for the German Ph@ttSessionz Database
Christoph Draxler | Florian Schiel | Tania Ellbogen
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The first release of the German Ph@ttSessionz speech database contains read and spontaneous speech from 864 adolescent speakers and is the largest database of its kind for German. It was recorded via the WWW in over 40 public schools in all dialect regions of Germany. In this paper, we present a cross-sectional study of f0 measurements on this database. The study documents the profound changes in male voices at the age 13-15. Furthermore, it shows that on a perceptive mel-scale, there is little difference in the relative f0 variability for male and female speakers. A closer analysis reveals that f0 variability is dependent on the speech style and both the length and the type of the utterance. The study provides statistically reliable voice parameters of adolescent speakers for German. The results may contribute to making spoken dialog systems more robust by restricting user input to utterances with low f0 variability.

pdf bib abs

ALC: Alcohol Language Corpus
Florian Schiel | Christian Heinrich | Sabine Barfüßer | Thomas Gilg
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

A number of forensic studies published during the last 50 years report that intoxication with alcohol influences speech in a way that is made manifest in certain features of the speech signal. However, most of these studies are based on data that are not publicly available nor of statistically sufficient size. Furthermore, in spite of the positive reports nobody ever successfully implemented a method to detect alcoholic intoxication from the speech signal. The Alcohol Language Corpus (ALC) aims to answer these open questions by providing a publicly available large and statistically sound corpus of intoxicated and sober speech. This paper gives a detailed description of the corpus features and methodology. Also, we will present some preliminary results on a series of verifications about reported potential features that are claimed to reliably indicate alcoholic intoxication.

2006

pdf bib abs

SmartWeb UMTS Speech Data Collection: The SmartWeb Handheld Corpus
Hannes Mögele | Moritz Kaiser | Florian Schiel
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we outline the German speech data collection for the SmartWeb project, which is fundedby the German Ministry of Science and Education. We focus on the SmartWeb Handheld Corpus (SHC), which has been collected by the Bavarian Archive for Speech Signals (BAS) at the Phonetic Institute (IPSK) of Munich University. Signals of SHC are being recorded in real-life environments(indoor and outdoor) with real background noise as well as real transmission line errors. We developed a new elicitation method and recording technique, calledsituational prompting, which facilitates collecting realistic dialogue speech data in a cost efficient way. We can show that almost realistic speech queries to a dialogue system issued over a mobile PDA or smart phonecan be collected very efficiently using an automatic speech server. We describe the technical and linguistic features of the resulting speech corpus, which will bepublicly available at BAS or ELDA.

pdf bib abs

Bikers Accessing the Web: The SmartWeb Motorbike Corpus
Moritz Kaiser | Hannes Mögele | Florian Schiel
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Three advanced German speech corpora have been collected during theGerman SmartWeb project. One of them, the SmartWeb MotorbikeCorpus (SMC) is described in this paper. As with all SmartWeb speech corpora SMC is designed for a dialogue system dealing with open domains. The corpus is recorded under the special circumstances of a motorbike ride and contains utterances of the driver related to information retrieval from various sources and different topics. Audio tracks show characteristic noise from the engine and surrounding traffic as well as drop outs caused by the transmission over Bluetooth and the UMTS mobile network. We discuss the problems of the technical setup and the fully automatic evocation of natural-spoken queries by means of dialogue-like sequences.