Petros Karatsareas
2026
Extending ASR Evaluation Resources for Modern Greek Dialects
Chara Tsoukala | Stavros Bompolas | Antigoni Margariti | Konstantina Panagiotou | Maria Elisavet Plaiti | Nefeli Tzanakaki | Petros Karatsareas | Angela Ralli | Antonios Anastasopoulos | Stella Markantonatou
Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects
Chara Tsoukala | Stavros Bompolas | Antigoni Margariti | Konstantina Panagiotou | Maria Elisavet Plaiti | Nefeli Tzanakaki | Petros Karatsareas | Angela Ralli | Antonios Anastasopoulos | Stella Markantonatou
Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects
Recent progress in Automatic Speech Recognition (ASR) has primarily benefited high-resource standard languages, while dialectal speech remains challenging and underexplored. We present an expanded benchmark for low-resource Modern Greek dialects, covering Aperathiot, Cretan, Lesbian, and Cappadocian, spanning southern, northern, and contact-influenced varieties with varying degrees of divergence from Standard Modern Greek. The benchmark provides dialectal transcriptions in the Greek alphabet, following SMG-based orthographic conventions, while preserving dialectal lexical and morphophonological forms. Using this benchmark, we evaluate state-of-the-art multilingual ASR models in a zero-shot setting and by further fine-tuning per dialect. Zero-shot results reveal a clear performance gradient with dialectal distance from Standard Modern Greek, with best WERs ranging from about 60-70% for southern dialects to over 80% for Lesbian and nearly 97% for Cappadocian. Fine-tuning substantially reduces error rates (up to 47% relative WER improvement), with Cappadocian remaining the most challenging variety (best WER 68.17%). Overall, our results highlight persistent limitations of current pretrained ASR models under dialectal variation and the need for dedicated benchmarks and adaptation strategies.