Felix Burkhardt


2022

pdf bib
A Comparative Cross Language View On Acted Databases Portraying Basic Emotions Utilising Machine Learning
Felix Burkhardt | Anabell Hacker | Uwe Reichel | Hagen Wierstorf | Florian Eyben | Björn Schuller
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Since several decades emotional databases have been recorded by various laboratories. Many of them contain acted portrays of Darwin’s famous “big four” basic emotions. In this paper, we investigate in how far a selection of them are comparable by two approaches: on the one hand modeling similarity as performance in cross database machine learning experiments and on the other by analyzing a manually picked set of four acoustic features that represent different phonetic areas. It is interesting to see in how far specific databases (we added a synthetic one) perform well as a training set for others while some do not. Generally speaking, we found indications for both similarity as well as specificiality across languages.

pdf bib
Nkululeko: A Tool For Rapid Speaker Characteristics Detection
Felix Burkhardt | Johannes Wagner | Hagen Wierstorf | Florian Eyben | Björn Schuller
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We present advancements with a software tool called Nkululeko, that lets users perform (semi-) supervised machine learning experiments in the speaker characteristics domain. It is based on audformat, a format for speech database metadata description. Due to an interface based on configurable templates, it supports best practise and very fast setup of experiments without the need to be proficient in the underlying language: Python. The paper explains the handling of Nkululeko and presents two typical experiments: comparing the expert acoustic features with artificial neural net embeddings for emotion classification and speaker age regression.

pdf bib
SyntAct: A Synthesized Database of Basic Emotions
Felix Burkhardt | Florian Eyben | Björn Schuller
Proceedings of the Workshop on Dataset Creation for Lower-Resourced Languages within the 13th Language Resources and Evaluation Conference

Speech emotion recognition is in the focus of research since several decades and has many applications. One problem is sparse data for supervised learning. One way to tackle this problem is the synthesis of data with emotion simulating speech synthesis approaches. We present a synthesized database of five basic emotions and neutral expression based on rule based manipulation for a diphone synthesizer which we release to the public. The database has been validated in several machine learning experiments as a training set to detect emotional expression from natural speech data. The scripts to generate such a database have been made open source and could be used to aid speech emotion recognition for a low resourced language, as MBROLA supports 35 languages

2016

pdf bib
A Taxonomy of Specific Problem Classes in Text-to-Speech Synthesis: Comparing Commercial and Open Source Performance
Felix Burkhardt | Uwe D. Reichel
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Current state-of-the-art speech synthesizers for domain-independent systems still struggle with the challenge of generating understandable and natural-sounding speech. This is mainly because the pronunciation of words of foreign origin, inflections and compound words often cannot be handled by rules. Furthermore there are too many of these for inclusion in exception dictionaries. We describe an approach to evaluating text-to-speech synthesizers with a subjective listening experiment. The focus is to differentiate between known problem classes for speech synthesizers. The target language is German but we believe that many of the described phenomena are not language specific. We distinguish the following problem categories: Normalization, Foreign linguistics, Natural writing, Language specific and General. Each of them is divided into five to three problem classes. Word lists for each of the above mentioned categories were compiled and synthesized by both a commercial and an open source synthesizer, both being based on the non-uniform unit-selection approach. The synthesized speech was evaluated by human judges using the Speechalyzer toolkit and the results are discussed. It shows that, as expected, the commercial synthesizer performs much better than the open-source one, and especially words of foreign origin were pronounced badly by both systems.

2012

pdf bib
“You Seem Aggressive!” Monitoring Anger in a Practical Application
Felix Burkhardt
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

A monitoring system to detect emotional outbursts in day-to-day communication is presented. The anger monitor was tested in a household and in parallel in an office surrounding. Although the state of the art of emotion recognition seems sufficient for practical applications, the acquisition of good training material remains a difficult task, as cross database performance is too low to be used in this context. A solution will probably consist of the combination of carefully drafted general training databases and the development of usability concepts to (re-) train the monitor in the field.

pdf bib
Fast Labeling and Transcription with the Speechalyzer Toolkit
Felix Burkhardt
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We describe a software tool named “Speechalyzer” which is optimized to process large speech data sets with respect to transcription, labeling and annotation. It is implemented as a client server based framework in Java and interfaces software for speech recognition, synthesis, speech classification and quality evaluation. The application is mainly the processing of training data for speech recognition and classification models and performing benchmarking tests on speech to text, text to speech and speech categorization software systems.

2010

pdf bib
A Database of Age and Gender Annotated Telephone Speech
Felix Burkhardt | Martin Eckert | Wiebke Johannsen | Joachim Stegmann
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This article describes an age-annotated database of German telephone speech. All in all 47 hours of prompted and free text was recorded, uttered by 954 paid participants in a style typical for automated voice services. The participants were selected based on an equal distribution of males and females within four age cluster groups; children, youth, adults and seniors. Within the children, gender is not distinguished, because it doesn’t have a strong enough effect on the voice. The textual content was designed to be typical for automated voice services and consists mainly of short commands, single words and numbers. An additional database consists of 659 speakers (368 female and 291 male) that called an automated voice portal server and answered freely on one of the two questions “What is your favourite dish?” and “What would you take to an island?” (island set, 422 speakers). This data might be used for out-of domain testing. The data will be used to tune an age-detecting automated voice service and might be released to research institutes under controlled conditions as part of an open age and gender detection challenge.