Lea Fischbach

2025

EDAudio: Easy Data Augmentation for Dialectal Audio
Lea Fischbach | Akbar Karimi | Alfred Lameli | Lucie Flek
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

We investigate lightweight and easily applicable data augmentation techniques for dialectal audio classification. We evaluate four main methods, namely shifting pitch, interval removal, background noise insertion and interval swap as well as several subvariants on recordings from 20 German dialects. Each main method is tested across multiple hyperparameter combinations, inlcuding augmentation length, coverage ratio and number of augmentations per original sample. Our results show that frequency-based techniques, particularly frequency masking, consistently yield performance improvements, while others such as time masking or speaker-based insertion can negatively affect the results. Our comparative analysis identifies which augmentations are most effective under realistic conditions, offering simple and efficient strategies to improve dialectal speech classification.

pdf bib abs

Does Preprocessing Matter? An Analysis of Acoustic Feature Importance in Deep Learning for Dialect Classification
Lea Fischbach | Caroline Kleen | Lucie Flek | Alfred Lameli
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)

This paper examines the effect of preprocessing techniques on spoken dialect classification using raw audio data. We focus on modifying Root Mean Square (RMS) amplitude, DC-offset, articulation rate (AR), pitch, and Harmonics-to-Noise Ratio (HNR) to assess their impact on model performance. Our analysis determines whether these features are important, irrelevant, or misleading for the classification task. To evaluate these effects, we use a pipeline that tests the significance of each acoustic feature through distortion and normalization techniques. While preprocessing did not directly improve classification accuracy, our findings reveal three key insights: deep learning models for dialect classification are generally robust to variations in the tested audio features, suggesting that normalization may not be necessary. We identify articulation rate as a critical factor, directly affecting the amount of information in audio chunks. Additionally, we demonstrate that intonation, specifically the pitch range, plays a vital role in dialect recognition.

2024

pdf bib abs

A Comparative Analysis of Speaker Diarization Models: Creating a Dataset for German Dialectal Speech
Lea Fischbach
Proceedings of the Third Workshop on NLP Applications to Field Linguistics

Speaker diarization is a critical task in the field of computer science, aiming to assign timestamps and speaker labels to audio segments. The aim of these tests in this Publication is to find a pretrained speaker diarization pipeline capable of distinguishing dialectal speakers from each other and an explorer. To achieve this, three pipelines, namely Pyannote, CLEAVER and NeMo, are tested and compared, across various segmentation and parameterization strategies. The study considers multiple scenarios, such as the impact of threshold values, overlap handling, and minimum duration parameters, on classification accuracy. Additionally, this study aims to create a dataset for German dialect identification (DID) based on the findings from this research.

Co-authors

Venues

Fix author