Aveen Jalal Mohammed
2026
Fleurs-Badini: Translation and Recording Fleurs Dataset for Badini Variant of Northern Kurdish
Mohammad Mohammadamini | Dilgash Mohammed Salih Tayib | Dezheen H. Abdulazeez | Barzan Hussein Mohammed | Imad Saeed Sadeeq | Aveen Jalal Mohammed | Amera Ismail Melhum | Abuobaida Abdullah Dheyab
Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)
Mohammad Mohammadamini | Dilgash Mohammed Salih Tayib | Dezheen H. Abdulazeez | Barzan Hussein Mohammed | Imad Saeed Sadeeq | Aveen Jalal Mohammed | Amera Ismail Melhum | Abuobaida Abdullah Dheyab
Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)
Multilingual speech benchmarks such as the FLEURS benchmark have significantly advanced research across a wide range of languages. However, important dialects, including Badini Kurdish, remain underrepresented, limiting bechmarking in automatic speech recognition (ASR) and speech-to-text translation (S2TT). To address this limitation, this study introduces FLEURS-Badini, a dialect-focused extension designed to support research on Northern Kurdish (Badini). The dataset is constructed through a structured process of translation, recording, and validation, resulting in 5,224 utterances paired with their corresponding translated text. The data were collected from 45 speakers. To evaluate the dataset, baseline experiments are conducted using state-of-the-art models for both ASR and S2TT. The results indicate that ASR remains challenging, with the best performance achieved by the W2V-BERT CTC model, reaching a Word Error Rate (WER) of approximately 55% on the test set. Similarly, speech-to-text translation performance is limited, with BLEU scores 6.13 and 5.24 on dev and test sets. Overall, FLEURS-Badini expands multilingual coverage and provides a standardized foundation for evaluating ASR and speech translation systems in the Badini dialect.