Fleurs-Badini: Translation and Recording Fleurs Dataset for Badini Variant of Northern Kurdish

Mohammad Mohammadamini, Dilgash Mohammed Salih Tayib, Dezheen H. Abdulazeez, Barzan Hussein Mohammed, Imad Saeed Sadeeq, Aveen Jalal Mohammed, Amera Ismail Melhum, Abuobaida Abdullah Dheyab


Abstract
Multilingual speech benchmarks such as the FLEURS benchmark have significantly advanced research across a wide range of languages. However, important dialects, including Badini Kurdish, remain underrepresented, limiting bechmarking in automatic speech recognition (ASR) and speech-to-text translation (S2TT). To address this limitation, this study introduces FLEURS-Badini, a dialect-focused extension designed to support research on Northern Kurdish (Badini). The dataset is constructed through a structured process of translation, recording, and validation, resulting in 5,224 utterances paired with their corresponding translated text. The data were collected from 45 speakers. To evaluate the dataset, baseline experiments are conducted using state-of-the-art models for both ASR and S2TT. The results indicate that ASR remains challenging, with the best performance achieved by the W2V-BERT CTC model, reaching a Word Error Rate (WER) of approximately 55% on the test set. Similarly, speech-to-text translation performance is limited, with BLEU scores 6.13 and 5.24 on dev and test sets. Overall, FLEURS-Badini expands multilingual coverage and provides a standardized foundation for evaluating ASR and speech translation systems in the Badini dialect.
Anthology ID:
2026.iwslt-1.14
Volume:
Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)
Month:
July
Year:
2026
Address:
San Diego, USA (in-person and online)
Editors:
Elizabeth Salesky, Antonios Anastasopoulos, Matteo Negri, Marcello Federico
Venues:
IWSLT | WS
SIG:
SIGSLT
Publisher:
Association for Computational Linguistics
Note:
Pages:
119–123
Language:
URL:
https://aclanthology.org/2026.iwslt-1.14/
DOI:
Bibkey:
Cite (ACL):
Mohammad Mohammadamini, Dilgash Mohammed Salih Tayib, Dezheen H. Abdulazeez, Barzan Hussein Mohammed, Imad Saeed Sadeeq, Aveen Jalal Mohammed, Amera Ismail Melhum, and Abuobaida Abdullah Dheyab. 2026. Fleurs-Badini: Translation and Recording Fleurs Dataset for Badini Variant of Northern Kurdish. In Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026), pages 119–123, San Diego, USA (in-person and online). Association for Computational Linguistics.
Cite (Informal):
Fleurs-Badini: Translation and Recording Fleurs Dataset for Badini Variant of Northern Kurdish (Mohammadamini et al., IWSLT 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.iwslt-1.14.pdf